Thank you all for your answers, this was really helpful!
Stefan Priebe wrote:
yes we have the same issues and switched to seagate
for those reasons.
you can fix at least a big part of it by disabling the write cache of
those drives - generally speaking it seems the toshiba firmware is
broken.
I was not able to find a newer one.
Good to know that we're not alone :) I also looked for a newer firmware, to no
avail.
Igor Fedotov wrote:
Benoit, wondering what are the write cache settings in
your case?
And do you see any difference after disabling it if any?
Write cache is enabled on all our OSDs (including the HGST drives that don't
have a latency issue).
To see if disabling write cache on the Toshiba drives would help, I turned it
off on all 12 drives in one of our OSD nodes:
```
for disk in /dev/sd{a..l}; do hdparm -W0 $disk; done
```
and left it on in the remaining nodes. I used `rados bench write` to create
some load on the cluster, and looked at
```
avg by (hostname) (ceph_osd_commit_latency_ms * on (ceph_daemon) group_left (hostname)
ceph_osd_metadata)
```
in Prometheus. The hosts with write cache _enabled_ had a commit latency around
145ms, while the host with write cache _disabled_ had a commit latency around
25ms. So it definitely helps!
Mark Nelson wrote:
This isn't the first time I've seen drive
cache cause problematic
latency issues, and not always from the same manufacturer.
Unfortunately it seems like you really have to test the drives you
want to use before deploying them them to make sure you don't run into
issues.
That's very true! Data sheets and even public benchmarks can be quite
deceiving, and two hard drives that seem to have similar performance profiles
can perform very differently within a Ceph cluster. Lesson learned.
Cheers,
--
Ben