The ceph mon logs... many of this unstoppable on my log:
------------------------------------------------------
2020-10-26T15:40:28.875729-0400 osd.23 [WRN] slow request
osd_op(client.86168166.0:9023356 5.56 5.1cd5a6d6 (undecoded)
ondisk+retry+write+known_if_redirected e159644) initiated
2020-10-26T15:57:51.597394+0000 currently queued for pg
2020-10-26T15:40:28.875745-0400 osd.23 [WRN] slow request
osd_op(client.86168166.0:9071950 5.56 5.1cd5a6d6 (undecoded)
ondisk+retry+write+known_if_redirected e159644) initiated
2020-10-26T15:57:51.599033+0000 currently queued for pg
2020-10-26T15:40:28.875761-0400 osd.23 [WRN] slow request
osd_op(client.86168166.0:9078184 5.56 5.1cd5a6d6 (undecoded)
ondisk+retry+write+known_if_redirected e159644) initiated
2020-10-26T15:57:51.600244+0000 currently queued for pg
2020-10-26T15:40:28.875781-0400 osd.23 [WRN] slow request
osd_op(client.86168166.0:9130749 5.56 5.1cd5a6d6 (undecoded)
ondisk+write+known_if_redirected e159652) initiated
2020-10-26T15:58:36.457562+0000 currently queued for pg
2020-10-26T15:40:28.878905-0400 osd.23 [WRN] slow request
osd_op(client.86168166.0:9130780 5.56 5.1cd5a6d6 (undecoded)
ondisk+write+known_if_redirected e159653) initiated
2020-10-26T16:01:11.470983+0000 currently queued for pg
2020-10-26T15:40:28.878936-0400 osd.23 [WRN] slow request
osd_op(client.86168166.0:9130812 5.56 5.1cd5a6d6 (undecoded)
ondisk+write+known_if_redirected e159653) initiated
2020-10-26T16:03:51.480523+0000 currently queued for pg
------------------------------------------------------------
El 2020-10-26 15:57, Eugen Block escribió:
> The recovery process (ceph -s) is independent of the MGR service but
> only depends on the MON service. It seems you only have the one MON,
> if the MGR is overloading it (not clear why) it could help to leave
> MGR off and see if the MON service then has enough RAM to proceed with
> the recovery. Do you have any chance to add two more MONs? A single
> MON is of course a single point of failure.
>
>
> Zitat von "Ing. Luis Felipe Domínguez Vega"
<luis.dominguez(a)desoft.cu>cu>:
>
>> El 2020-10-26 15:16, Eugen Block escribió:
>>> You could stop the MGRs and wait for the recovery to finish, MGRs are
>>> not a critical component. You won’t have a dashboard or metrics
>>> during/of that time but it would prevent the high RAM usage.
>>>
>>> Zitat von "Ing. Luis Felipe Domínguez Vega"
>>> <luis.dominguez(a)desoft.cu>cu>:
>>>
>>>> El 2020-10-26 12:23, 胡 玮文 escribió:
>>>>>> 在 2020年10月26日,23:29,Ing. Luis Felipe Domínguez Vega
>>>>>> <luis.dominguez(a)desoft.cu> 写道:
>>>>>>
>>>>>> mgr: fond-beagle(active, since 39s)
>>>>>
>>>>> Your manager seems crash looping, it only started since 39s.
>>>>> Looking
>>>>> at mgr logs may help you identify why your cluster is not
>>>>> recovering.
>>>>> You may hit some bug in mgr.
>>>> Noup, I'm restarting the ceph manager because they eat all server
>>>> RAM and then i have an script that when i have 1GB of Free Ram (the
>>>> server has 94 Gb of RAM) then restart the manager, i dont known
>>>> why and the logs of manager are:
>>>>
>>>> -----------------------------------
>>>> root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle/store.db# tail
>>>> -f /var/log/ceph/ceph-mgr.fond-beagle.log
>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700 0 log_channel(cluster)
>>>> log [DBG] : pgmap v584: 2305 pgs: 4
>>>> active+undersized+degraded+remapped, 4
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700 0 log_channel(cluster)
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700 0 log_channel(cluster)
>>>> log [DBG] : pgmap v585: 2305 pgs: 4
>>>> active+undersized+degraded+remapped, 4
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700 0 log_channel(cluster)
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700 0 log_channel(cluster)
>>>> log [DBG] : pgmap v586: 2305 pgs: 4
>>>> active+undersized+degraded+remapped, 4
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700 0 log_channel(cluster)
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700 0 log_channel(cluster)
>>>> log [DBG] : pgmap v587: 2305 pgs: 4
>>>> active+undersized+degraded+remapped, 4
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700 0 log_channel(cluster)
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700 0 log_channel(cluster)
>>>> log [DBG] : pgmap v588: 2305 pgs: 4
>>>> active+undersized+degraded+remapped, 4
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700 0 log_channel(cluster)
>>>> do_log log to syslog
>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700 0 log_channel(cluster)
>>>> log [DBG] : pgmap v589: 2305 pgs: 4
>>>> active+undersized+degraded+remapped, 4
>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>> active+clean, 5 active+undersized+degraded, 34 incomplete, 154
>>>> unknown; 1.7 TiB data, 2.9 TiB used, 21 TiB / 24 TiB avail;
>>>> 347248/2606900 objects degraded (13.320%); 107570/2606900 objects
>>>> misplaced (4.126%); 19/404328 objects unfound (0.005%)
>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700 0 log_channel(cluster)
>>>> do_log log to syslog
>>>> ---------------
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>> Ok i will do that... but the thing is that the cluster not show
>> recovering, not show that are doing nothing, like to show the
>> recovering info on ceph -s command, and then i dont know if is
>> recovering or doing what?