High read throughput on BlueFS - ceph-users

3 Dec 2020

Hi all,

When my cluster gets into a recovery state (adding new node) I see a huge
read throughput on its disks and it affects the latency! Disks are SSD and
they don't have a separate WAL/DB.
I'm using nautilus 14.2.14 and bluefs_buffered_io is false by default. When
this throughput came on my disk it will get too much high latency. After I
turned on bluefs_buffered_io another huge throughput around 1.2GB/s came in
and it against affect my latency but much less than the previous one!
(Graphs are attached and bluefs_buffered_io was turned on with ceph tell
injectargs at 13:41 also I have restarted the OSD at 13:16 because it
doesn't get better at the moment)

I have four questions:
1. What are they? I see the recovery speed is 20MB/s and client io on that
OSD is 10MB/s so what is this high throughput for?!
2. How can I control this throughput? Because my disks don't support this
much throughput!
3. I see a common issue here https://tracker.ceph.com/issues/36482 that I
think it's similar to my case. It was discussed about read_ahead, well
should I change the read_ahead_kb config of my disk to support this type of
request? I'm using the default value in ubuntu (128)
4. Is there any tuning required that can help to turn off the
bluefs_buffered_io again?

Configs I used for recovery:
osd max backfills = 1
osd recovery max active = 1
osd recovery op priority = 1
osd recovery priority = 1
osd recovery sleep ssd = 0.2

My OSD memory target is around 6GB.

Thanks.