Hi Wido
No. My results with Ceph (yeah I still use it) are the same, and I use Threadrippers which
have almost 4 GHz clockspeed.
Network isn't the main problem. The main problem is a lot of program logic written in
a complex way which leads to high CPU usage.
https://yourcmc.ru/wiki/Ceph_performance if
you haven't already seen it.
I achieve ~7000 QD=1 iops with Vitastor just because it's much simpler. And I'm
gradually progressing feature-wise... :-)
Regards, Vitaliy
> (Sending it to dev list as people might know it there)
>
> Hi,
>
> There are many talks and presentations out there about Ceph's
> performance. Ceph is great when it comes to parallel I/O, large queue
> depths and many applications sending I/O towards Ceph.
>
> One thing where Ceph isn't the fastest are 4k blocks written at Queue
> Depth 1.
>
> Some applications benefit very much from high performance/low latency
> I/O at qd=1, for example Single Threaded applications which are writing
> small files inside a VM running on RBD.
>
> With some tuning you can get to a ~700us latency for a 4k write with
> qd=1 (Replication, size=3)
>
> I benchmark this using fio:
>
> $ fio --ioengine=librbd --bs=4k --iodepth=1 --direct=1 .. .. .. ..
>
> 700us latency means the result will be about ~1500 IOps (1000 / 0.7)
>
> When comparing this to let's say a BSD machine running ZFS that's on the
> low side. With ZFS+NVMe you'll be able to reach about somewhere between
> 7.000 and 10.000 IOps, the latency is simply much lower.
>
> My benchmarking / test setup for this:
>
> - Ceph Nautilus/Octopus (doesn't make a big difference)
> - 3x SuperMicro 1U with:
> - AMD Epyc 7302P 16-core CPU
> - 128GB DDR4
> - 10x Samsung PM983 3,84TB
> - 10Gbit Base-T networking
>
> Things to configure/tune:
>
> - C-State pinning to 1
> - CPU governer to performance
> - Turn off all logging in Ceph (debug_osd, debug_ms, debug_bluestore=0)
>
> Higher clock speeds (New AMD Epyc coming in March!) help to reduce the
> latency and going towards 25Gbit/100Gbit might help as well.
>
> These are however only very small increments and might help to reduce
> the latency by another 15% or so.
>
> It doesn't bring us anywhere near the 10k IOps other applications can do.
>
> And I totally understand that replication over a TCP/IP network takes
> time and thus increases latency.
>
> The Crimson project [0] is aiming to lower the latency with many things
> like DPDK and SPDK, but this is far from finished and production ready.
>
> In the meantime, am I overseeing some things here? Can we reduce the
> latency further of the current OSDs?
>
> Reaching a ~500us latency would already be great!
>
> Thanks,
>
> Wido
>
> [0]:
https://docs.ceph.com/en/latest/dev/crimson/crimson
> _______________________________________________
> Dev mailing list -- dev(a)ceph.io
> To unsubscribe send an email to dev-leave(a)ceph.io