[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

4 Jun 2020

   * bluestore: common/options.cc: disable bluefs_preextend_wal_files  <--
from 15.2.3 changelogs. There was a bug which lead to issues on OSD
restart, and I believe this was the attempt at mitigation until a proper
bugfix could be put into place. I suspect this might be the cause of the
symptoms you're seeing.

https://tracker.ceph.com/issues/45613
https://github.com/ceph/ceph/pull/35293

On Thu, Jun 4, 2020 at 8:07 AM Thomas Gradisnik &lt;tg(a)relaxt.at&gt; wrote:

...
  We have deployed a small test cluster consisting of
three nodes. Each node
 is running a mon/mgr and two osds (Samsung PM983 3,84TB NVMe split into two
 partitions), so six osds in total. We started with Ceph 14.2.7 some weeks
 ago (upgraded to 14.2.9 later) and ran different tests using fio against
 some rbd volumes in order to get an overview what performance we could
 expect. The configuration is unchanged compared to the defaults, we only
 set several debugging options to 0/0.

 Yesterday we upgraded the whole cluster following the upgrade guidelines
 to Ceph 15.2.3, which worked without any problems so far. Nevertheless when
 running the same tests as before with Ceph 14.2.9, we are seeing some clear
 degradations in write-performance (beside some performance improvements,
 which shall also be mentioned).

 Here the results of concern (each with the relevant fio settings used):

 Test "read-latency-max"
 (rw=randread, iodepth=64, bs=4k)
 read_iops: 32500 -> 87000

 Test "write-latency-max"
 (rw=randwrite, iodepth=64, bs=4k)
 write_iops: 22500 -> 11500

 Test "write-throughput-iops-max"
 (rw=write, iodepth=64, bs=4k)
 write_iops: 7000 -> 14000

 Test "usecase1"
 (rw=randrw,

bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,4k/50:8k/20:16k/20:32k/5:64k/2:128k/:256k/,
 rwmixread=1, rate_process=poisson, iodepth=64)
 write_iops: 21000 -> 8500

 Test "usecase1-readonly"
 (rw=randread, bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,
 rate_process=poisson, iodepth=64)
 read_iops: 28000 -> 58000

 The last two tests represent a typical use case on our systems. Therefore
 we are especially concerned by the drop in performance from 21000 w/ops to
 8500 w/ops (about 60%) after upgrading to Ceph 15.2.3.

 We ran all tests several times, the values are averaged over all
 iterations and fairly consistent and reproducible. We even tried wiping the
 whole cluster, downgrading to Ceph 14.2.9 again, setting up a new
 cluster/pool, running the tests and upgrading to Ceph 15.2.3 again. The
 tests have been performed on one of the three cluster nodes using a 50G rbd
 volume, which had been prefilled with random data before each test-run.

 Have any changes been introduced with Octopus that could explain the
 observed changes in performance?

 What we already tried:

 - Disabling rbd cache
 - Reverting rbc cache policy to writeback (default in 14.2)
 - Setting rbd io scheduler to none
 - Deploying a fresh cluster starting with Ceph 15.2.3

 Kernel is 5.4.38 … I don't know if some other system specs would be
 helpful besides the already mentioned (since we are talking about a
 relative change in performance after upgrading Ceph without any further
 changes) - if so, please let us know.
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Degradation of write-performance after upgrading to Octopus