I will definitely follow your steps and apply bluefs_buffered_io=true via ceph.conf and
restart. My first try was to update these dynamically. I’ll report when it’s done.
We monitor our clusters via Telegraf (Ceph input Plugin) and InfluxDB and a custom Grafana
dashboard fitted for our needs.
Björn
Am 13.02.2021 um 09:23 schrieb Frank Schilder
<frans(a)dtu.dk>dk>:
Ahh, OK. I'm not sure if it has that effect. What people observed was, that rocks-DB
access became faster due to system buffer cache hits. This has an indirect influence on
data access latency.
The typical case is "high IOPs on WAL/DB device after upgrade" and setting
bluefs_buffered_io=true got this back to normal also improving client performance as a
result.
Your latency graphs look actually suspiciously like it should work for you. Are you sure
the OSD is using the value? I had problems with setting some parameters, I needed to
include them in the ceph.conf file and restart to force them through.
A sign that bluefs_buffered_io=true is applied is rapidly increasing system buffer usage
reported by top or free. If the values reported are similar for all hosts,
bluefs_buffered_io is still disabled.
If I may ask, what framework are you using to pull these graphs? Is there a graphana
dashboard one can download somewhere or is it something you implemented yourself? I plan
to enable prometheus on our cluster, but don't know about a good data sink providing a
pre-defined dashboard.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Björn Dolkemeier <b.dolkemeier(a)dbap.de>
Sent: 13 February 2021 08:51:11
To: Frank Schilder
Cc: ceph-users(a)ceph.io
Subject: Re: [ceph-users] Latency increase after upgrade 14.2.8 to 14.2.16
Thanks for the quick reply, Frank.
Sorry, the graphs/attachment where filtered. Here is an example of one latency:
https://drive.google.com/file/d/1qSWmSmZ6JXVweepcoY13ofhfWXrBi2uZ/view?usp=…
I’m aware that the overall performance depends on the slowest OSD.
What I expect is that bluefs_buffered_io=true set on one OSD reflects in dropped
latencies for that particular OSD.
Best regards,
Björn
Am 13.02.2021 um 07:39 schrieb Frank Schilder
<frans@dtu.dk<mailto:frans@dtu.dk>>:
The graphs were forgotten or filtered out.
Changing the buffered_io value on one host will not change client IO performance as its
always the slowest OSD thats decisive. However, it should have an effect on the IOP/s load
reported by iostat on the disks on the host.
Does setting bluefs_buffered_io=true on all hosts have an effect on client IO? Note that
it might need a restart even if the documentation says otherwise.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Björn Dolkemeier <b.dolkemeier@dbap.de<mailto:b.dolkemeier@dbap.de>>
Sent: 13 February 2021 07:16:06
To: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
Subject: [ceph-users] Latency increase after upgrade 14.2.8 to 14.2.16
Hi,
after upgrading Ceph from 14.2.8 to 14.2.16 we experienced increased latencies. There
were no changes in hardware, configuration, workload or networking, just a rolling-update
via ceph-ansible on running production cluster. The cluster consists of 16 OSDs (all SSD)
over 4 Nodes. The VMs served via RBD from this cluster currently suffer on i/o wait cpu.
These are some latencies that are increased after the update:
- op_r_latency
- op_w_latency
- kv_final_lat
- state_kv_commiting_lat
- submit_lat
- subop_w_latency
Do these latencies point to KV/RocksDB?
These are some latencies which are NOT increased after the update:
- kv_sync_lat
- kv_flush_lat
- kv_commit_lat
I attached one graph showing the massive increase after the update.
I tried setting bluefs_buffered_io=true (as it’s default value was changed and it was
mentioned as performance relevant) for all OSDs in one host but this does not make a
difference.
The ceph.conf is fairly simple:
[global]
cluster network = xxx
fsid = xxx
mon host = xxx
public network = xxx
[osd]
osd memory target = 10141014425
Any ideas what to try? Help appreciated.
Björn
--
dbap GmbH
phone +49 251 609979-0 / fax +49 251 609979-99
Heinr.-von-Kleist-Str. 47, 48161 Muenster, Germany
http://www.dbap.de
dbap GmbH, Sitz: Muenster
HRB 5891, Amtsgericht Muenster
Geschaeftsfuehrer: Bjoern Dolkemeier
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io