[ceph-users] Re: Very slow snaptrim operations blocking client I/O

29 Jan 2023

I should have explicitly stated that during the recovery, it was still 
quite bumpy for customers.  Some snaptrims were very quick, some took 
what felt like a really long time.  This was however a cluster with a 
very large number of volumes and a long, long history of snapshots.  I'm 
not sure what the difference will be from our case versus a single large 
volume with a big snapshot.

On 2023-01-28 20:45, Victor Rodriguez wrote:
> On 1/29/23 00:50, Matt Vandermeulen wrote:
>> I've observed a similar horror when upgrading a cluster from Luminous 
>> to Nautilus, which had the same effect of an overwhelming amount of 
>> snaptrim making the cluster unusable.
>> 
>> In our case, we held its hand by setting all OSDs to have zero max 
>> trimming PGs, unsetting nosnaptrim, and then slowly enabling snaptrim 
>> a few OSDs at a time.  It was painful to babysit but it allowed the 
>> cluster to catch up without falling over.
> 
> 
> That's an interesting approach! Thanks!
> 
> On preliminary tests seems that just running snaptrim on a single PG of 
> a single OSD still makes the cluster barely usable. I have to increase 
> osd_snap_trim_sleep_ssd to ~1 so the cluster remains usable by getting 
> a third of its performance. After a while, a few PG got trimmed and 
> feels like some of them are harder to trim than others, as some need a 
> higher osd_snap_trim_sleep_ssd value to let the cluster perform.
> 
> I don't know how long this is going to take... Maybe recreating the 
> OSD's and dealing with the rebalance is a better option?
> 
> There's something ugly going on here... I would really like to put my 
> finger on it.
> 
> 
>> On 2023-01-28 19:43, Victor Rodriguez wrote:
>>> After some investigation this is what I'm seeing:
>>> 
>>> - OSD processes get stuck at least at 100% CPU if I ceph osd unset 
>>> nosnaptrim. They keep at 100% CPU even if I ceph osd set nosnaptrim. 
>>> They stayed like that for at least 26 hours. Some quick benchmarks 
>>> don't show a reduction of the performance of the cluster.
>>> 
>>> - Restarting a OSD lowers it's CPU usage to typical levels, as 
>>> expected, but it also usually sets some other OSD in a different host 
>>> to typical levels.
>>> 
>>> - All OSDs in this cluster take quite a bit to start: between 35 to 
>>> 70 seconds depending on the OSD. Clearly much longer than any other 
>>> OSD in any of my clusters.
>>> 
>>> - I believe that the size of the rocksdb database is dumped in the 
>>> OSD log when an automatic compact operation is triggered. The "sum"

>>> sizes of these OSD range between 2.5 and 5.1 GB. Thats way bigger 
>>> that those in any other cluster I have.
>>> 
>>> - ceph daemon osd.* calc_objectstore_db_histogram is giving values 
>>> for num_pgmeta_omap (I don't know what it is) way bigger than those 
>>> on any other of my clusters for some OSD. Also, values are not 
>>> similar among the OSD which hold the same PGs.
>>> 
>>> osd.0:    "num_pgmeta_omap": 17526766,
>>> osd.1:    "num_pgmeta_omap": 2653379,
>>> osd.2:    "num_pgmeta_omap": 12358703,
>>> osd.3:    "num_pgmeta_omap": 6404975,
>>> osd.6:    "num_pgmeta_omap": 19845318,
>>> osd.7:    "num_pgmeta_omap": 6043083,
>>> osd.12:   "num_pgmeta_omap": 18666776,
>>> osd.13:    "num_pgmeta_omap": 615846,
>>> osd.14:    "num_pgmeta_omap": 13190188,
>>> 
>>> - Compacting the OSD barely reduces rocksdb size and does not reduce 
>>> num_pgmeta_omap at all.
>>> 
>>> - This is the only cluster I have were there are some RBD images that 
>>> I mount directly from some clients, that is, they are not disks for 
>>> QEMU/Proxmox VMs. Maybe I have something misconfigured related to 
>>> this?  This cluster is at least two and half years old an never had 
>>> this issue with snaptrims.
>>> 
>>> Thanks in advance!
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Very slow snaptrim operations blocking client I/O