This is an expensive operation. You want to slow it down, not burden the OSDs.
> On Mar 21, 2020, at 5:46 AM, Jan Pekař - Imatic <jan.pekar(a)imatic.cz> wrote:
>
> Each node has 64GB RAM so it should be enough (12 OSD's = 48GB used).
>
>> On 21/03/2020 13.14, XuYun wrote:
>> Bluestore requires more than 4G memory per OSD, do you have enough memory?
>>
>>> 2020年3月21日 下午8:09,Jan Pekař - Imatic <jan.pekar(a)imatic.cz> 写道:
>>>
>>> Hello,
>>>
>>> I have ceph cluster version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8)
nautilus (stable)
>>>
>>> 4 nodes - each node 11 HDD, 1 SSD, 10Gbit network
>>>
>>> Cluster was empty, fresh install. We filled cluster with data (small blocks)
using RGW.
>>>
>>> Cluster is now used for testing so no client was using it during my admin
operations mentioned below
>>>
>>> After a while (7TB of data / 40M objects uploaded) we decided, that we
increase pg_num from 128 to 256 to better spread data and to speedup this operation,
I've set
>>>
>>> ceph config set mgr target_max_misplaced_ratio 1
>>>
>>> so that whole cluster rebalance as quickly as it can.
>>>
>>> I have 3 issues/questions below:
>>>
>>> 1)
>>>
>>> I noticed, that manual increase from 128 to 256 caused approx. 6 OSD's to
restart with logged
>>>
>>> heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f8c84b8b700'
had suicide timed out after 150
>>>
>>> after a while OSD's were back so I continued after a while with my
tests.
>>>
>>> My question - increasing number of PG with maximal target_max_misplaced_ratio
was too much for that OSDs? It is not recommended to do it this way? I had no problem with
this increase before, but configuration of cluster was slightly different and it was
luminous version.
>>>
>>> 2)
>>>
>>> Rebuild was still slow so I increased number of backfills
>>>
>>> ceph tell osd.* injectargs "--osd-max-backfills 10"
>>>
>>> and reduced recovery sleep time
>>>
>>> ceph tell osd.* injectargs "--osd-recovery-sleep-hdd 0.01"
>>>
>>> and after few hours I noticed, that some of my OSD's were restarted
during recovery, in log I can see
>>>
>>> ...
>>>
>>> |2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21
06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread
0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.780 7fe1da154700 1
heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out
after 15 2020-03-21 06:41:36.888 7fe1e7769700 0 log_channel(cluster) log [WRN] : Monitor
daemon marked osd.7 down, but it is still running 2020-03-21 06:41:36.888 7fe1e7769700 0
log_channel(cluster) log [DBG] : map e3574 wrongly marked me down at e3573 2020-03-21
06:41:36.888 7fe1e7769700 1 osd.7 3574 start_waiting_for_healthy |
>>>
>>> I observed network graph usage and network utilization was low during
recovery (10Gbit was not saturated).
>>>
>>> So lot of IOPS on OSD causes also hartbeat operation to timeout? I thought
that OSD is using threads and HDD timeouts are not influencing heartbeats to other
OSD's and MON. It looks like it is not true.
>>>
>>> 3)
>>>
>>> After OSD was wrongly marked down I can see that cluster has object degraded.
There were no degraded object before that.
>>>
>>> Degraded data redundancy: 251754/117225048 objects degraded (0.215%), 8 pgs
degraded, 8 pgs undersized
>>>
>>> It means that this OSD disconnection causes data degraded? How is it
possible, when no OSD was lost. Data should be on that OSD and after peering should be
everything OK. With luminous I had no problem, after OSD up degraded objects where
recovered/found during few seconds and cluster was healthy within seconds.
>>>
>>> Thank you very much for additional info. I can perform additional tests you
recommend because cluster is used for testing purpose now.
>>>
>>> With regards
>>> Jan Pekar
>>>
>>> --
>>> ============
>>> Ing. Jan Pekař
>>> jan.pekar(a)imatic.cz
>>> ----
>>> Imatic | Jagellonská 14 | Praha 3 | 130 00
>>>
http://www.imatic.cz | +420326555326
>>> ============
>>> --
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
> --
> ============
> Ing. Jan Pekař
> jan.pekar(a)imatic.cz
> ----
> Imatic | Jagellonská 14 | Praha 3 | 130 00
>
http://www.imatic.cz | +420326555326
> ============
> --
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io