I understand, but i delete the OSDs from CRUSH map, so ceph don't wait
for these OSDs, i'm right?
El 2020-10-27 04:06, Eugen Block escribió:
> Hi,
>
> just to clarify so I don't miss anything: you have two DCs and one of
> them is down. And two of the MONs were in that failed DC? Now you
> removed all OSDs and two MONs from the failed DC hoping that your
> cluster will recover? If you have reasonable crush rules in place
> (e.g. to recover from a failed DC) your cluster will never recover in
> the current state unless you bring OSDs back up on the second DC.
> That's why you don't see progress in the recovery process, the PGs are
> waiting for their peers in the other DC so they can follow the crush
> rules.
>
> Regards,
> Eugen
>
>
> Zitat von "Ing. Luis Felipe Domínguez Vega"
<luis.dominguez(a)desoft.cu>cu>:
>
>> I was 3 mons, but i have 2 physical datacenters, one of them breaks
>> with not short term fix, so i remove all osds and ceph mon (2 of
>> them) and now i have only the osds of 1 datacenter with the monitor.
>> I was stopped the ceph manager, but i was see that when i restart a
>> ceph manager then ceph -s show recovering info for a short term of 20
>> min more or less, then dissapear all info.
>>
>> The thing is that sems the cluster is not self recovering and the
>> ceph monitor is "eating" all of the HDD.
>>
>> El 2020-10-26 15:57, Eugen Block escribió:
>>> The recovery process (ceph -s) is independent of the MGR service but
>>> only depends on the MON service. It seems you only have the one MON,
>>> if the MGR is overloading it (not clear why) it could help to leave
>>> MGR off and see if the MON service then has enough RAM to proceed
>>> with
>>> the recovery. Do you have any chance to add two more MONs? A single
>>> MON is of course a single point of failure.
>>>
>>>
>>> Zitat von "Ing. Luis Felipe Domínguez Vega"
>>> <luis.dominguez(a)desoft.cu>cu>:
>>>
>>>> El 2020-10-26 15:16, Eugen Block escribió:
>>>>> You could stop the MGRs and wait for the recovery to finish, MGRs
>>>>> are
>>>>> not a critical component. You won’t have a dashboard or metrics
>>>>> during/of that time but it would prevent the high RAM usage.
>>>>>
>>>>> Zitat von "Ing. Luis Felipe Domínguez Vega"
>>>>> <luis.dominguez(a)desoft.cu>cu>:
>>>>>
>>>>>> El 2020-10-26 12:23, 胡 玮文 escribió:
>>>>>>>> 在 2020年10月26日,23:29,Ing. Luis Felipe Domínguez Vega
>>>>>>>> <luis.dominguez(a)desoft.cu> 写道:
>>>>>>>>
>>>>>>>> mgr: fond-beagle(active, since 39s)
>>>>>>>
>>>>>>> Your manager seems crash looping, it only started since 39s.
>>>>>>> Looking
>>>>>>> at mgr logs may help you identify why your cluster is not
>>>>>>> recovering.
>>>>>>> You may hit some bug in mgr.
>>>>>> Noup, I'm restarting the ceph manager because they eat all
server
>>>>>> RAM and then i have an script that when i have 1GB of Free Ram
>>>>>> (the server has 94 Gb of RAM) then restart the manager, i dont
>>>>>> known why and the logs of manager are:
>>>>>>
>>>>>> -----------------------------------
>>>>>> root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle/store.db#
>>>>>> tail -f /var/log/ceph/ceph-mgr.fond-beagle.log
>>>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> log [DBG] : pgmap v584: 2305 pgs: 4
>>>>>> active+undersized+degraded+remapped, 4
>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900
objects
>>>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> do_log log to syslog
>>>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> log [DBG] : pgmap v585: 2305 pgs: 4
>>>>>> active+undersized+degraded+remapped, 4
>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900
objects
>>>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> do_log log to syslog
>>>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> log [DBG] : pgmap v586: 2305 pgs: 4
>>>>>> active+undersized+degraded+remapped, 4
>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900
objects
>>>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> do_log log to syslog
>>>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> log [DBG] : pgmap v587: 2305 pgs: 4
>>>>>> active+undersized+degraded+remapped, 4
>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900
objects
>>>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> do_log log to syslog
>>>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> log [DBG] : pgmap v588: 2305 pgs: 4
>>>>>> active+undersized+degraded+remapped, 4
>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900
objects
>>>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> do_log log to syslog
>>>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> log [DBG] : pgmap v589: 2305 pgs: 4
>>>>>> active+undersized+degraded+remapped, 4
>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900
objects
>>>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700 0
log_channel(cluster)
>>>>>> do_log log to syslog
>>>>>> ---------------
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>
>>>> Ok i will do that... but the thing is that the cluster not show
>>>> recovering, not show that are doing nothing, like to show the
>>>> recovering info on ceph -s command, and then i dont know if is
>>>> recovering or doing what?