Hi all,


I am having trouble with our cluster getting consistent RBD latencies to our KVM virtual machines connected via the KRBD driver. When measuring with tools like rbd perf image iotop, we constantly see latency spike up from around 1-2ms to 100+ms. This seems to kill our Windows VM SQL performance. I essentially have 2 questions:


1) Am I missing something with my configuration that should be applied to get consistent low latency to the VM guests?


2) When measuring the disks, it seems that sequential IO results in higher latency vs random IO. Is this correct or is there a way to tweak this using the KRBD driver?


Configuration:


3 x MON/MGR nodes


12 x OSD nodes (24 x HDD, 2 x NVMe for DB and WAL)


KVM clients attaching the RBD images via KRBD


1 pool w/ 16384 PGs


Ceph version 14.2.1


Ceph.conf:


[global]

    mon host = 10.97.11.17,10.97.11.27,10.97.11.37

    public network  = 10.97.11.0/24
    cluster network = 10.97.12.0/24

    auth cluster required = cephx
    auth service required = cephx
    auth client required  = cephx

    osd journal size           = 30720
    osd pool default size      = 3
    osd pool default min size  = 2
    osd pool default pg num    = 4096
    osd pool default pgp num   = 4096
    osd crush chooseleaf type  = 1

[osd]

    bluestore_default_buffered_write  = false
    bluestore_default_buffered_read   = true


Thank you,