We have a ceph cluster (Proxmox based) with is HDD-based. We’ve had
some performance and “slow MDS” issues while doing VM/CT backups from
the Proxmox cluster, especially when rebalancing is going on at the same
time.
I also had to increase the mds cache quite a lot to get rid of 'slow' issues,
ds_cache_memory_limit =
But after some Luminous(?) update I started using less the cephfs, because of issues with
the kernel mount.
My thought is that one of following is going to
improve performance /
response:
1. Add an M.2 drive for DB store on each node
2. Migrate the cephfs metadata pool to SSDs
We have ~25 nodes with ~3 OSDs per node.
(1) is a lot of work and will cost more.
(2) seems more risky (to me) since the metadata pool would have to be
migrated (potential loss in transit?)
Can't remember running into issues having done this switch quite a while ago. Still
have only 7GB on ssd meta data pool vs 45TB / 13kk objects.