[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

28 Oct 2020

If you have that many spare hosts I would recommend to deploy two more  
MONs on them, and probably also additional MGRs so they can failover.

What is the EC profile for the data_storage pool?

Can you also share

ceph pg dump pgs | grep -v "active+clean"

to see which PGs are affected.
The remaining issue with unfound objects and unkown PGs could be  
because you removed OSDs. That could mean data loss, but maybe there's  
a chance to recover anyway.

Zitat von "Ing. Luis Felipe Domínguez Vega" &lt;luis.dominguez(a)desoft.cu&gt;cu>:

...
  Well recovering not working yet... i was started 6
servers more and  
 the cluster not yet recovered.
 Ceph status not show any recover progress

 ceph -s                 : https://pastebin.ubuntu.com/p/zRQPbvGzbw/
 ceph osd tree           : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/
 ceph osd df             : https://pastebin.ubuntu.com/p/ysbh8r2VVz/
 ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
 crush rules             : (ceph osd crush rule dump)   
 https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/

 El 2020-10-27 09:59, Eugen Block escribió:
> Your pool 'data_storage' has a size of 7 (or 7 chunks since it's
> erasure-coded) and the rule requires each chunk on a different host
> but you currently have only 5 hosts available, that's why the recovery
> is not progressing. It's waiting for two more hosts. Unfortunately,
> you can't change the EC profile or the rule of that pool. I'm not sure
> if it would work in the current cluster state, but if you can't add
> two more hosts (which would be your best option for recovery) it might
> be possible to create a new replicated pool (you seem to have enough
> free space) and copy the contents from that EC pool. But as I said,
> I'm not sure if that would work in a degraded state, I've never tried
> that.
>
> So your best bet is to get two more hosts somehow.
>
>
>> pool 4 'data_storage' erasure profile desoft size 7 min_size 5   
>> crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32   
>> autoscale_mode off last_change 154384 lfor 0/121016/121014 flags   
>> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384   
>> application rbd
>
>
> Zitat von "Ing. Luis Felipe Domínguez Vega"
&lt;luis.dominguez(a)desoft.cu&gt;cu>:
>
>> Needed data:
>>
>> ceph -s                 : https://pastebin.ubuntu.com/p/S9gKjyZtdK/
>> ceph osd tree           : https://pastebin.ubuntu.com/p/SCZHkk6Mk4/
>> ceph osd df             : (later, because i'm waiting since 10   
>> minutes and not output yet)
>> ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
>> crush rules             : (ceph osd crush rule dump)   
>> https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/
>>
>> El 2020-10-27 07:14, Eugen Block escribió:
>>>> I understand, but i delete the OSDs from CRUSH map, so ceph  
>>>> don't   wait for these OSDs, i'm right?
>>>
>>> It depends on your actual crush tree and rules. Can you share (maybe
>>> you already did)
>>>
>>> ceph osd tree
>>> ceph osd df
>>> ceph osd pool ls detail
>>>
>>> and a dump of your crush rules?
>>>
>>> As I already said, if you have rules in place that distribute data
>>> across 2 DCs and one of them is down the PGs will never recover even
>>> if you delete the OSDs from the failed DC.
>>>
>>>
>>>
>>> Zitat von "Ing. Luis Felipe Domínguez Vega"
&lt;luis.dominguez(a)desoft.cu&gt;cu>:
>>>
>>>> I understand, but i delete the OSDs from CRUSH map, so ceph  
>>>> don't   wait for these OSDs, i'm right?
>>>>
>>>> El 2020-10-27 04:06, Eugen Block escribió:
>>>>> Hi,
>>>>>
>>>>> just to clarify so I don't miss anything: you have two DCs and
one of
>>>>> them is down. And two of the MONs were in that failed DC? Now you
>>>>> removed all OSDs and two MONs from the failed DC hoping that your
>>>>> cluster will recover? If you have reasonable crush rules in place
>>>>> (e.g. to recover from a failed DC) your cluster will never recover
in
>>>>> the current state unless you bring OSDs back up on the second DC.
>>>>> That's why you don't see progress in the recovery process,
the PGs are
>>>>> waiting for their peers in the other DC so they can follow the crush
>>>>> rules.
>>>>>
>>>>> Regards,
>>>>> Eugen
>>>>>
>>>>>
>>>>> Zitat von "Ing. Luis Felipe Domínguez Vega"
&lt;luis.dominguez(a)desoft.cu&gt;cu>:
>>>>>
>>>>>> I was 3 mons, but i have 2 physical datacenters, one of them    
>>>>>> breaks  with not short term fix, so i remove all osds and ceph  
>>>>>>  mon  (2 of  them) and now i have only the osds of 1  
>>>>>> datacenter  with the  monitor.  I was stopped the ceph  
>>>>>> manager, but i was  see that when  i restart a  ceph manager  
>>>>>> then ceph -s show  recovering info for a  short term of  20  
>>>>>> min more or less, then  dissapear all info.
>>>>>>
>>>>>> The thing is that sems the cluster is not self recovering and   
>>>>>> the   ceph monitor is "eating" all of the HDD.
>>>>>>
>>>>>> El 2020-10-26 15:57, Eugen Block escribió:
>>>>>>> The recovery process (ceph -s) is independent of the MGR
service but
>>>>>>> only depends on the MON service. It seems you only have the
one MON,
>>>>>>> if the MGR is overloading it (not clear why) it could help to
leave
>>>>>>> MGR off and see if the MON service then has enough RAM to
proceed with
>>>>>>> the recovery. Do you have any chance to add two more MONs? A
single
>>>>>>> MON is of course a single point of failure.
>>>>>>>
>>>>>>>
>>>>>>> Zitat von "Ing. Luis Felipe Domínguez Vega"  
>>>>>>> &lt;luis.dominguez(a)desoft.cu&gt;cu>:
>>>>>>>
>>>>>>>> El 2020-10-26 15:16, Eugen Block escribió:
>>>>>>>>> You could stop the MGRs and wait for the recovery to 

>>>>>>>>> finish, MGRs are
>>>>>>>>> not a critical component. You won’t have a dashboard
or metrics
>>>>>>>>> during/of that time but it would prevent the high RAM
usage.
>>>>>>>>>
>>>>>>>>> Zitat von "Ing. Luis Felipe Domínguez Vega"

>>>>>>>>> &lt;luis.dominguez(a)desoft.cu&gt;cu>:
>>>>>>>>>
>>>>>>>>>> El 2020-10-26 12:23, 胡 玮文 escribió:
>>>>>>>>>>>> 在 2020年10月26日，23:29，Ing. Luis Felipe
Domínguez Vega       
>>>>>>>>>>>> &lt;luis.dominguez(a)desoft.cu&gt; 写道：
>>>>>>>>>>>>
>>>>>>>>>>>> mgr: fond-beagle(active, since 39s)
>>>>>>>>>>>
>>>>>>>>>>> Your manager seems crash looping, it only
started since  
>>>>>>>>>>> 39s. Looking
>>>>>>>>>>> at mgr logs may help you identify why your
cluster is not  
>>>>>>>>>>>  recovering.
>>>>>>>>>>> You may hit some bug in mgr.
>>>>>>>>>> Noup, I'm restarting the ceph manager because
they eat all  
>>>>>>>>>>    server   RAM and then i have an script that
when i have  
>>>>>>>>>> 1GB  of   Free Ram  (the  server has 94 Gb of
RAM) then  
>>>>>>>>>> restart  the   manager, i dont  known why  and
the logs of  
>>>>>>>>>> manager are:
>>>>>>>>>>
>>>>>>>>>> -----------------------------------
>>>>>>>>>>
root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle/store.db#    
>>>>>>>>>> tail    -f
/var/log/ceph/ceph-mgr.fond-beagle.log
>>>>>>>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v584:
2305 pgs: 4  
>>>>>>>>>>      active+undersized+degraded+remapped, 4      

>>>>>>>>>>
active+recovery_unfound+undersized+degraded+remapped, 2104  
>>>>>>>>>>      active+clean, 5 active+undersized+degraded,
34  
>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9
TiB used,  
>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900
objects  
>>>>>>>>>> degraded (13.320%);  107570/2606900   objects  
misplaced  
>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v585:
2305 pgs: 4  
>>>>>>>>>>      active+undersized+degraded+remapped, 4      

>>>>>>>>>>
active+recovery_unfound+undersized+degraded+remapped, 2104  
>>>>>>>>>>      active+clean, 5 active+undersized+degraded,
34  
>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9
TiB used,  
>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900
objects  
>>>>>>>>>> degraded (13.320%);  107570/2606900   objects  
misplaced  
>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v586:
2305 pgs: 4  
>>>>>>>>>>      active+undersized+degraded+remapped, 4      

>>>>>>>>>>
active+recovery_unfound+undersized+degraded+remapped, 2104  
>>>>>>>>>>      active+clean, 5 active+undersized+degraded,
34  
>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9
TiB used,  
>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900
objects  
>>>>>>>>>> degraded (13.320%);  107570/2606900   objects  
misplaced  
>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v587:
2305 pgs: 4  
>>>>>>>>>>      active+undersized+degraded+remapped, 4      

>>>>>>>>>>
active+recovery_unfound+undersized+degraded+remapped, 2104  
>>>>>>>>>>      active+clean, 5 active+undersized+degraded,
34  
>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9
TiB used,  
>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900
objects  
>>>>>>>>>> degraded (13.320%);  107570/2606900   objects  
misplaced  
>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v588:
2305 pgs: 4  
>>>>>>>>>>      active+undersized+degraded+remapped, 4      

>>>>>>>>>>
active+recovery_unfound+undersized+degraded+remapped, 2104  
>>>>>>>>>>      active+clean, 5 active+undersized+degraded,
34  
>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9
TiB used,  
>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900
objects  
>>>>>>>>>> degraded (13.320%);  107570/2606900   objects  
misplaced  
>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v589:
2305 pgs: 4  
>>>>>>>>>>      active+undersized+degraded+remapped, 4      

>>>>>>>>>>
active+recovery_unfound+undersized+degraded+remapped, 2104  
>>>>>>>>>>      active+clean, 5 active+undersized+degraded,
34  
>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9
TiB used,  
>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900
objects  
>>>>>>>>>> degraded (13.320%);  107570/2606900   objects  
misplaced  
>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700  0    

>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>> ---------------
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>>>>>>> To unsubscribe send an email to
ceph-users-leave(a)ceph.io
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>>>>>> To unsubscribe send an email to
ceph-users-leave(a)ceph.io
>>>>>>>>
>>>>>>>> Ok i will do that... but the thing is that the cluster
not   
>>>>>>>> show    recovering, not show that are doing nothing, like
to  
>>>>>>>>  show the    recovering info on ceph -s command, and then
i   
>>>>>>>> dont know if is    recovering or doing what? 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]