Stuck MDSs behind in trimming - ceph-users

8 Jul 2021

We're running a rook-ceph cluster that has gotten stuck in "1 MDSs behind
on trimming".

* 1 filesystem, three active MDS servers each with standby
* Quite a few files (20M objects), daily snapshots. This might be a
problem?
* Ceph pacific 16.2.4

* `ceph health detail` doesn't provide much help (see below)
* num_segments is very slowly increasing over time
* Restarting all of the MDSs returns to the same point.
* moderate CPU usage for each MDS server (~30% for the stuck one, ~80% of a
core for the others)
* logs for the stuck MDS looks clean, it hits rejoin_joint_start then
standard 'updating MDS map to version XXX" messages
* `ceph daemon mds.x ops` shows no active ops on each of the MDS servers
* `mds_log_max_segments` is set to 128, setting to a higher number causes
the warning to go away, but the filesystem remains degraded, and setting it
back to 128 shows num_segments has not changed.
* I've tried playing around with other MDS settings based on various posts
on this list and elsewhere, to no avail
* `cephfs-journal-tool journal inspect` for each rank says journal
integrity is fine.

Something similar happened last week and (probably by accident by
removing/adding nodes?) I got the MDSs to start recovering and the
filesystem went back to healthy.

I'm at a bit of a loss for what else to try.

Thanks!
Zack

`ceph health detail`
HEALTH_WARN mons are allowing insecure global_id reclaim; 1 filesystem is
degraded; 1 MDSs behind on trimming; mon x is low on available space
[WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure
global_id reclaim
    mon.x has auth_allow_insecure_global_id_reclaim set to true
    mon.ad has auth_allow_insecure_global_id_reclaim set to true
    mon.af has auth_allow_insecure_global_id_reclaim set to true
[WRN] FS_DEGRADED: 1 filesystem is degraded
    fs myfs is degraded
[WRN] MDS_TRIM: 1 MDSs behind on trimming
    mds.myfs-d(mds.2): Behind on trimming (340/128) max_segments: 128,
num_segments: 340
[WRN] MON_DISK_LOW: mon x is low on available space
    mon.x has 22% avail

`ceph config get mds`
WHO     MASK  LEVEL     OPTION                              VALUE        RO
global        basic     log_file                                         *
global        basic     log_to_file                         false
mds           basic     mds_cache_memory_limit              17179869184
mds           advanced  mds_cache_trim_decay_rate           1.000000
mds           advanced  mds_cache_trim_threshold            1048576
mds           advanced  mds_log_max_segments                128
mds           advanced  mds_recall_max_caps                 5000
mds           advanced  mds_recall_max_decay_rate           2.500000
global        advanced  mon_allow_pool_delete               true
global        advanced  mon_allow_pool_size_one             true
global        advanced  mon_cluster_log_file
global        advanced  mon_pg_warn_min_per_osd             0
global        advanced  osd_pool_default_pg_autoscale_mode  on
global        advanced  osd_scrub_auto_repair               true
global        advanced  rbd_default_features                3