I've found that these setting make Ceph clients have much more consistent latency than the default scheduler, it also reduces the impact of backfills and recoveries. It may not give you better performance (although I have seen it allow all disks to be utilized to 100% rather than only as fast as the slowest disk), but could help with the VM with the SCSI resets.

Put this in your ceph.conf and restart your OSDs.
osd op queue = wpq
osd op queue cut off = high

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Aug 13, 2019 at 6:12 PM Richard Bade <hitrich@gmail.com> wrote:
Hi Everyone,
There's been a few threads around about small HDD (spinning disk)
clusters and performance on Bluestore.
One recently from Christian
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036385.html)
was particularly interesting to us as we have a very similar setup to
what Christian has and we see similar performance.

We have a 6 node cluster each with 12x 4TB SATA HDD, IT mode LSI 3008, wal/db on
33GB NVMe partitions. Each node has a single Xeon Gold 6132 CPU @
2.60GHz and dual 10GB network.
We also use bcache with 1 180GB NVMe partition shared between 6 osd's.
Workload is via KVM (Proxmox)

I did the same benchmark fio tests as Christian. Here's my results (M
for me, C for Christian)
direct=0
========
M -- read : io=6008.0MB, bw=203264KB/s, iops=49, runt= 30267msec
C -- read: IOPS=40, BW=163MiB/s (171MB/s)(7556MiB/46320msec)

direct=1
========
M -- read : io=32768MB, bw=1991.4MB/s, iops=497, runt= 16455ms
C -- read: IOPS=314, BW=1257MiB/s (1318MB/s)(32.0GiB/26063msec)

direct=0
========
M -- write: io=32768MB, bw=471105KB/s, iops=115, runt= 71225msec
C -- write: IOPS=119, BW=479MiB/s (503MB/s)(32.0GiB/68348msec

direct=1
========
M -- write: io=32768MB, bw=479829KB/s, iops=117, runt= 69930msec
C -- write: IOPS=139, BW=560MiB/s (587MB/s)(32.0GiB/58519msec)

I should probably mention that there was some active workload on the
cluster at that time also, around 500iops write and 100MB/s
throughput.
The main problem that we're having with this cluster is how easy it is
for it to hit slow requests and we have one particular vm that ends up
doing scsi resets because of the latency.

So we're considering switching these osd's to filestore.
We have two other clusters using filestore/bcache/ssd journal and the
performance seems to be much better on those - taking into account the
different sizes.
What are peoples thoughts on this size cluster? Is it just not a good
fit with bluestore and our type of workload?
Also, does anyone have any knowledge on future support for filestore?
I'm concerned that we may have to migrate our other clusters off
filestore sometime in the future and that'll hurt us with the current
performance.

Rich
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io