[ceph-users] Re: Random heartbeat_map timed out

23 Dec 2020

Hi Seena,

one of the frequent cause for such a timeout is slow RocksDB 
operationing. Which in turn might be caused by bluefs_buffered_io set to 
false and/or DB "fragmentation" after massive data removal.

Hence the potential workarounds are adjusting bluefs_buffered_io and 
manual RocksDB compaction.

This topic has been discussed in this mailing list and relevant tickets 
multiple times.

Thanks,

Igor

On 12/23/2020 3:24 PM, Seena Fallah wrote:
> Hi,
>
> All my OSD nodes in the SSD tier are getting heartbeat_map timed out
> randomly and I don't find why!
>
> 7ff2ed3f2700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread
> 0x7ff2c8943700' had timed out after 15
>
> It occurs many times in a day and causes my cluster to be down.
>
> Is there any way to find why the OSDs get time out? I don't think it's
> because of heartbeat and there is an issue with OSD that came to the
> heartbeat to be timeout because ODSs don't suicide and OSDs get too slow
> and cause downtime on RBD and S3 gateway because the queue is full!
>
> Thanks.
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Random heartbeat_map timed out