[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

26 Oct 2020

The ceph mon logs... many of this unstoppable on my log:

------------------------------------------------------
2020-10-26T15:40:28.875729-0400 osd.23 [WRN] slow request 
osd_op(client.86168166.0:9023356 5.56 5.1cd5a6d6 (undecoded) 
ondisk+retry+write+known_if_redirected e159644) initiated 
2020-10-26T15:57:51.597394+0000 currently queued for pg
2020-10-26T15:40:28.875745-0400 osd.23 [WRN] slow request 
osd_op(client.86168166.0:9071950 5.56 5.1cd5a6d6 (undecoded) 
ondisk+retry+write+known_if_redirected e159644) initiated 
2020-10-26T15:57:51.599033+0000 currently queued for pg
2020-10-26T15:40:28.875761-0400 osd.23 [WRN] slow request 
osd_op(client.86168166.0:9078184 5.56 5.1cd5a6d6 (undecoded) 
ondisk+retry+write+known_if_redirected e159644) initiated 
2020-10-26T15:57:51.600244+0000 currently queued for pg
2020-10-26T15:40:28.875781-0400 osd.23 [WRN] slow request 
osd_op(client.86168166.0:9130749 5.56 5.1cd5a6d6 (undecoded) 
ondisk+write+known_if_redirected e159652) initiated 
2020-10-26T15:58:36.457562+0000 currently queued for pg
2020-10-26T15:40:28.878905-0400 osd.23 [WRN] slow request 
osd_op(client.86168166.0:9130780 5.56 5.1cd5a6d6 (undecoded) 
ondisk+write+known_if_redirected e159653) initiated 
2020-10-26T16:01:11.470983+0000 currently queued for pg
2020-10-26T15:40:28.878936-0400 osd.23 [WRN] slow request 
osd_op(client.86168166.0:9130812 5.56 5.1cd5a6d6 (undecoded) 
ondisk+write+known_if_redirected e159653) initiated 
2020-10-26T16:03:51.480523+0000 currently queued for pg
------------------------------------------------------------

El 2020-10-26 15:57, Eugen Block escribió:
> The recovery process (ceph -s) is independent of the MGR service but
> only depends on the MON service. It seems you only have the one MON,
> if the MGR is overloading it (not clear why) it could help to leave
> MGR off and see if the MON service then has enough RAM to proceed with
>  the recovery. Do you have any chance to add two more MONs? A single
> MON is of course a single point of failure.
> 
> 
> Zitat von "Ing. Luis Felipe Domínguez Vega"
&lt;luis.dominguez(a)desoft.cu&gt;cu>:
> 
>> El 2020-10-26 15:16, Eugen Block escribió:
>>> You could stop the MGRs and wait for the recovery to finish, MGRs are
>>> not a critical component. You won’t have a dashboard or metrics
>>> during/of that time but it would prevent the high RAM usage.
>>> 
>>> Zitat von "Ing. Luis Felipe Domínguez Vega" 
>>> &lt;luis.dominguez(a)desoft.cu&gt;cu>:
>>> 
>>>> El 2020-10-26 12:23, 胡 玮文 escribió:
>>>>>> 在 2020年10月26日，23:29，Ing. Luis Felipe Domínguez Vega   
>>>>>> &lt;luis.dominguez(a)desoft.cu&gt; 写道：
>>>>>> 
>>>>>> mgr: fond-beagle(active, since 39s)
>>>>> 
>>>>> Your manager seems crash looping, it only started since 39s. 
>>>>> Looking
>>>>> at mgr logs may help you identify why your cluster is not 
>>>>> recovering.
>>>>> You may hit some bug in mgr.
>>>> Noup, I'm restarting the ceph manager because they eat all server   
>>>> RAM and then i have an script that when i have 1GB of Free Ram  (the 
>>>>  server has 94 Gb of RAM) then restart the manager, i dont  known 
>>>> why  and the logs of manager are:
>>>> 
>>>> -----------------------------------
>>>> root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle/store.db# tail   
>>>> -f /var/log/ceph/ceph-mgr.fond-beagle.log
>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v584: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v585: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v586: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v587: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v588: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> log [DBG] : pgmap v589: 2305 pgs: 4   
>>>> active+undersized+degraded+remapped, 4   
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104   
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154   
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;   
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects   
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700  0 log_channel(cluster)   
>>>> do_log log to syslog
>>>> ---------------
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> 
>> Ok i will do that... but the thing is that the cluster not show  
>> recovering, not show that are doing nothing, like to show the  
>> recovering info on ceph -s command, and then i dont know if is  
>> recovering or doing what?

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]