-----Original Message-----
From: Vladimir Prokofev <v(a)prokofev.me>
Sent: Monday, November 2, 2020 3:46 AM
Cc: ceph-users <ceph-users(a)ceph.io>
Subject: [ceph-users] Re: read latency
With sequential read you get "read ahead" mechanics attached which helps
a lot.
So let's say you do 4KB seq reads with fio.
By default, Ubuntu, for example, has 128KB read ahead size. That means
when you request that 4KB of data, driver will actually request 128KB.
When your IO is served, and you request next seq 4KB, they're already in
VMs memory, so no new read IO is necessary.
All those 128KB will likely reside on the same OSD, depending on your
CEPH object size.
When you'll reach the end of that 128KB of data, and request next - once
again it will likely reside in the same rbd object as before, assuming
4MB object size, so depending on the internal mechanics which I'm not
really familiar with, that data can be either in the hosts memory, or at
least in osd node memory, so no real physical IO will be necessary.
What you're thinking about is the worst case scenario - when that 128KB
is split between 2 objects residing on 2 different osds - well, you just
get 2 real physical IO for your 1 virtual, and in that moment you'll
have slower request, but after that you get read ahead to help for a lot
of seq IOs.
In the end, read ahead with sequential IOs leads to way way less real
physical reads than random read, hence the IOPS difference.
пн, 2 нояб. 2020 г. в 06:20, Tony Liu <tonyliu0592(a)hotmail.com>om>:
Another confusing about read vs. random read. My
understanding is
that, when fio does read, it reads from the test file sequentially.
When it does random read, it reads from the test file randomly.
That file read inside VM comes down to volume read handed by RBD
client who distributes read to PG and eventually to OSD. So a file
sequential read inside VM won't be a sequential read on OSD disk.
Is that right?
Then what difference seq. and rand. read make on OSD disk?
Is it rand. read on OSD disk for both cases?
Then how to explain the performance difference between seq. and rand.
read inside VM? (seq. read IOPS is 20x than rand. read, Ceph is with
21 HDDs on 3 nodes, 7 on each)
Thanks!
Tony
> -----Original Message-----
> From: Vladimir Prokofev <v(a)prokofev.me>
> Sent: Sunday, November 1, 2020 5:58 PM
> Cc: ceph-users <ceph-users(a)ceph.io>
> Subject: [ceph-users] Re: read latency
>
> Not exactly. You can also tune network/software.
> Network - go for lower latency interfaces. If you have 10G go to 25G
> or 100G. 40G will not do though, afaik they're just 4x10G so their
> latency is the same as in 10G.
> Software - it's closely tied to your network card queues and
> processor cores. In short - tune affinity so that the packet receive
> queues and osds processes run on the same corresponding cores.
> Disabling process power saving features helps a lot. Also watch out
for NUMA interference.
But
overall all these tricks will save you less than switching from
HDD to SSD.
пн, 2 нояб. 2020 г. в 02:45, Tony Liu <tonyliu0592(a)hotmail.com>om>:
Hi,
AWIK, the read latency primarily depends on HW latency, not much
can be tuned in SW. Is that right?
I ran a fio random read with iodepth 1 within a VM backed by Ceph
with HDD OSD and here is what I got.
=================
read: IOPS=282, BW=1130KiB/s (1157kB/s)(33.1MiB/30001msec)
slat (usec): min=4, max=181, avg=14.04, stdev=10.16
clat (usec): min=178, max=393831, avg=3521.86, stdev=5771.35
lat (usec): min=188, max=393858, avg=3536.38, stdev=5771.51
================= I checked HDD average latency is 2.9 ms. Looks
like the test result makes perfect sense, isn't it?
If I want to get shorter latency (more IOPS), I will have to go
for better disk, eg. SSD. Right?
Thanks!
Tony
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send
an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io