EC Pools w/ RBD - IOPs - ceph-users

14 Feb 2020

Hi Ceph Community,

Wondering what experiences good/bad you have with EC pools for iops intensive workloads
(IE: 4Kish random IO from things like VMWare ESXi). I realize that EC pools are a tradeoff
between more usable capacity, and having larger latency/lower iops, but in my testing the
tradeoff for small IO seems to be much worse than I had anticipated.

On an all flash 3x replicated pool we’re seeing 45k random read, and 35k random write iops
testing with fio on a client living on an iSCSI LUN presented to an ESXi host. Average
latencies for these ops are 4.2ms, and 5.5ms, which is respectable at an io depth of 32.

Take this same setup with an EC pool (k=2, m=1, tested with both ISA and jerasure, ISA
does give better performance for our use case) and we see 30k random read, and 16k random
write iops. Random reads see 6.5ms average, while random writes suffer with 12ms
average.

Are others using EC pools seeing similar hits to random writes with small IOs? Any way to
improve this?

Thanks,
Anthony