Hi.
It's not Ceph to blame!
Linux does not support cached asynchronous I/O, except for the new
io-uring! I.e. it supports aio calls, but they just block when you're
trying to do them on an FD opened without O_DIRECT.
So basically what happens when you benchmark it with -ioengine=libaio
-direct=0 is that it's turning into SINGLE-THREADED I/O!
Of course the single-threaded performance is worse.
> Hi Everyone,
> There's been a few threads around about small HDD (spinning disk)
> clusters and performance on Bluestore.
> One recently from Christian
> (
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036385.html)
> was particularly interesting to us as we have a very similar setup to
> what Christian has and we see similar performance.
>
> We have a 6 node cluster each with 12x 4TB SATA HDD, IT mode LSI 3008,
> wal/db on
> 33GB NVMe partitions. Each node has a single Xeon Gold 6132 CPU @
> 2.60GHz and dual 10GB network.
> We also use bcache with 1 180GB NVMe partition shared between 6 osd's.
> Workload is via KVM (Proxmox)
>
> I did the same benchmark fio tests as Christian. Here's my results (M
> for me, C for Christian)
> direct=0
> ========
> M -- read : io=6008.0MB, bw=203264KB/s, iops=49, runt= 30267msec
> C -- read: IOPS=40, BW=163MiB/s (171MB/s)(7556MiB/46320msec)
>
> direct=1
> ========
> M -- read : io=32768MB, bw=1991.4MB/s, iops=497, runt= 16455ms
> C -- read: IOPS=314, BW=1257MiB/s (1318MB/s)(32.0GiB/26063msec)
>
> direct=0
> ========
> M -- write: io=32768MB, bw=471105KB/s, iops=115, runt= 71225msec
> C -- write: IOPS=119, BW=479MiB/s (503MB/s)(32.0GiB/68348msec
>
> direct=1
> ========
> M -- write: io=32768MB, bw=479829KB/s, iops=117, runt= 69930msec
> C -- write: IOPS=139, BW=560MiB/s (587MB/s)(32.0GiB/58519msec)
>
> I should probably mention that there was some active workload on the
> cluster at that time also, around 500iops write and 100MB/s
> throughput.
> The main problem that we're having with this cluster is how easy it is
> for it to hit slow requests and we have one particular vm that ends up
> doing scsi resets because of the latency.
>
> So we're considering switching these osd's to filestore.
> We have two other clusters using filestore/bcache/ssd journal and the
> performance seems to be much better on those - taking into account the
> different sizes.
> What are peoples thoughts on this size cluster? Is it just not a good
> fit with bluestore and our type of workload?
> Also, does anyone have any knowledge on future support for filestore?
> I'm concerned that we may have to migrate our other clusters off
> filestore sometime in the future and that'll hurt us with the current
> performance.
>
> Rich
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io