* bluestore: common/options.cc: disable bluefs_preextend_wal_files <--
from 15.2.3 changelogs. There was a bug which lead to issues on OSD
restart, and I believe this was the attempt at mitigation until a proper
bugfix could be put into place. I suspect this might be the cause of the
symptoms you're seeing.
https://tracker.ceph.com/issues/45613
https://github.com/ceph/ceph/pull/35293
On Thu, Jun 4, 2020 at 8:07 AM Thomas Gradisnik <tg(a)relaxt.at> wrote:
We have deployed a small test cluster consisting of
three nodes. Each node
is running a mon/mgr and two osds (Samsung PM983 3,84TB NVMe split into two
partitions), so six osds in total. We started with Ceph 14.2.7 some weeks
ago (upgraded to 14.2.9 later) and ran different tests using fio against
some rbd volumes in order to get an overview what performance we could
expect. The configuration is unchanged compared to the defaults, we only
set several debugging options to 0/0.
Yesterday we upgraded the whole cluster following the upgrade guidelines
to Ceph 15.2.3, which worked without any problems so far. Nevertheless when
running the same tests as before with Ceph 14.2.9, we are seeing some clear
degradations in write-performance (beside some performance improvements,
which shall also be mentioned).
Here the results of concern (each with the relevant fio settings used):
Test "read-latency-max"
(rw=randread, iodepth=64, bs=4k)
read_iops: 32500 -> 87000
Test "write-latency-max"
(rw=randwrite, iodepth=64, bs=4k)
write_iops: 22500 -> 11500
Test "write-throughput-iops-max"
(rw=write, iodepth=64, bs=4k)
write_iops: 7000 -> 14000
Test "usecase1"
(rw=randrw,
bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,4k/50:8k/20:16k/20:32k/5:64k/2:128k/:256k/,
rwmixread=1, rate_process=poisson, iodepth=64)
write_iops: 21000 -> 8500
Test "usecase1-readonly"
(rw=randread, bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,
rate_process=poisson, iodepth=64)
read_iops: 28000 -> 58000
The last two tests represent a typical use case on our systems. Therefore
we are especially concerned by the drop in performance from 21000 w/ops to
8500 w/ops (about 60%) after upgrading to Ceph 15.2.3.
We ran all tests several times, the values are averaged over all
iterations and fairly consistent and reproducible. We even tried wiping the
whole cluster, downgrading to Ceph 14.2.9 again, setting up a new
cluster/pool, running the tests and upgrading to Ceph 15.2.3 again. The
tests have been performed on one of the three cluster nodes using a 50G rbd
volume, which had been prefilled with random data before each test-run.
Have any changes been introduced with Octopus that could explain the
observed changes in performance?
What we already tried:
- Disabling rbd cache
- Reverting rbc cache policy to writeback (default in 14.2)
- Setting rbd io scheduler to none
- Deploying a fresh cluster starting with Ceph 15.2.3
Kernel is 5.4.38 … I don't know if some other system specs would be
helpful besides the already mentioned (since we are talking about a
relative change in performance after upgrading Ceph without any further
changes) - if so, please let us know.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io