[ceph-users] Re: block.db/block.wal device performance dropped after upgrade to 14.2.10

5 Aug 2020

Hello Vladimir,

I just tested this with a single node testcluster with 60 HDDs (3 of
them with bluestore without separate wal and db).

With the 14.2.10, I see on the bluestore OSDs a lot of read IOPs while
snaptrimming. With 14.2.9 this was not an issue. 

I wonder if this would explain the huge amount of slowops on my big
testcluster (44 Nodes 1056 OSDs) while snaptrimming. I
cannot test a downgrade there, because there are no packages of older
releases for CentOS 8 available.

Regards
Manuel

On Tue, 4 Aug 2020 13:22:34 +0300
Vladimir Prokofev &lt;v(a)prokofev.me&gt; wrote:

...
  Here's some more insight into the issue.
 Looks like the load is triggered because of a snaptrim operation. We
 have a backup pool that serves as Openstack cinder-backup storage,
 performing snapshot backups every night. Old backups are also deleted
 every night, so snaptrim is initiated.
 This snaptrim increased load on the block.db devices after upgrade,
 and just kills one SSD's performance in particular. It serves as a
 block.db/wal device for one of the fatter backup pool OSDs which has
 more PGs placed there.
 This is a Kingston SSD, and we see this issue on other Kingston SSD
 journals too, Intel SSD journals are not that affected, though they
 too experience increased load.
 Nevertheless, there're now a lot of read IOPS on block.db devices
 after upgrade that were not there before.
 I wonder how 600 IOPS can destroy SSDs performance that hard.

 вт, 4 авг. 2020 г. в 12:54, Vladimir Prokofev &lt;v(a)prokofev.me&gt;me>:

  Good day, cephers!

 We've recently upgraded our cluster from 14.2.8 to 14.2.10 release,
 also performing full system packages upgrade(Ubuntu 18.04 LTS).
 After that performance significantly dropped, main reason beeing
 that journal SSDs are now have no merges, huge queues, and
 increased latency. There's a few screenshots in attachments. This
 is for an SSD journal that supports block.db/block.wal for 3
 spinning OSDs, and it looks like this for all our SSD block.db/wal
 devices across all nodes. Any ideas what may cause that? Maybe I've
 missed something important in release notes?
    _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: block.db/block.wal device performance dropped after upgrade to 14.2.10