[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

26 Oct 2020

I was 3 mons, but i have 2 physical datacenters, one of them breaks with 
not short term fix, so i remove all osds and ceph mon (2 of them) and 
now i have only the osds of 1 datacenter with the monitor. I was stopped 
the ceph manager, but i was see that when i restart a ceph manager then 
ceph -s show recovering info for a short term of 20 min more or less, 
then dissapear all info.

The thing is that sems the cluster is not self recovering and the ceph 
monitor is "eating" all of the HDD.

El 2020-10-26 15:57, Eugen Block escribió:
> The recovery process (ceph -s) is independent of the MGR service but
> only depends on the MON service. It seems you only have the one MON,
> if the MGR is overloading it (not clear why) it could help to leave
> MGR off and see if the MON service then has enough RAM to proceed with
>  the recovery. Do you have any chance to add two more MONs? A single
> MON is of course a single point of failure.
> 
> 
> Zitat von "Ing. Luis Felipe Domínguez Vega"
&lt;luis.dominguez(a)desoft.cu&gt;cu>:
> 
>> El 2020-10-26 15:16, Eugen Block escribió:
>>> You could stop the MGRs and wait for the recovery to finish, MGRs are
>>> not a critical component. You won’t have a dashboard or metrics
>>> during/of that time but it would prevent the high RAM usage.
>>> 
>>> Zitat von "Ing. Luis Felipe Domínguez Vega" 
>>> &lt;luis.dominguez(a)desoft.cu&gt;cu>:
>>> 
>>>> El 2020-10-26 12:23, 胡 玮文 escribió:
>>>>>> 在 2020年10月26日，23:29，Ing. Luis Felipe Domínguez Vega   
>>>>>> &lt;luis.dominguez(a)desoft.cu&gt; 写道：
>>>>>> 
>>>>>> mgr: fond-beagle(active, since 39s)
>>>>> 
>>>>> Your manager seems crash looping, it only started since 39s. 
>>>>> Looking
>>>>> at mgr logs may help you identify why your cluster is not 
>>>>> recovering.
>>>>> You may hit some bug in mgr.
>>>> Noup, I'm restarting the ceph manager because they eat all server   
>>>> RAM and then i have an script that when i have 1GB of Free Ram  (the 
>>>>  server has 94 Gb of RAM) then restart the manager, i dont  known 
>>>> why  and the logs of manager are:
>>>> 
>>>> -----------------------------------
>>>> root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle/store.db# tail   
>>>> -f /var/log/ceph/ceph-mgr.fond-beagle.log
>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v584: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v585: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v586: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v587: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v588: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v589: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> ---------------
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> 
>> Ok i will do that... but the thing is that the cluster not show  
>> recovering, not show that are doing nothing, like to show the  
>> recovering info on ceph -s command, and then i dont know if is  
>> recovering or doing what?

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]