Just throwing my hat in here with a small bit of anecdotal experience.
In the early days of experimenting with ceph, I had 24x 8T disk, all behind RAID
controllers as R0 vd's with no BBU (so controller cache is WT, default value), and
pdcache (disk write cache) enabled (default value).
We had a lightning strike at our previous data center that killed power, and we ended up
losing the entire ceph pool (not prod), due mostly in part to the pdcache setting.
We then did an exhaustive failure test following that, further isolating the pdcache as
the culprit, and not the controllers write cache. The controllers now have BBU's to
further prevent issues, but WB cache with the BBU did not yield issues, only pdcache.
So, all of this to say, in my experience, the on-disk write cache was a huge liability for
losing writes.
This was also in the filestore days, and most of our issues were with XFS, but the point
remains.
Write cache can be a consistency killer, and I recommend disabling where possible.
Reed
On Jun 24, 2020, at 10:30 AM, Paul Emmerich
<paul.emmerich(a)croit.io> wrote:
Has anyone ever encountered a drive with a write cache that actually
*helped*?
I haven't.
As in: would it be a good idea for the OSD to just disable the write cache
on startup? Worst case it doesn't do anything, best case it improves
latency.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at
https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Wed, Jun 24, 2020 at 3:49 PM Frank R <frankaritchie(a)gmail.com> wrote:
fyi, there is an interesting note on disabling
the write cache here:
https://yourcmc.ru/wiki/index.php?title=Ceph_performance&mobileaction=t…
On Wed, Jun 24, 2020 at 9:45 AM Benoît Knecht <bknecht(a)protonmail.ch>
wrote:
Hi Igor,
Igor Fedotov wrote:
> for the sake of completeness one more experiment please if possible:
>
> turn off write cache for HGST drives and measure commit latency once
again.
I just did the same experiment with HGST drives, and disabling the write
cache
on those drives brought the latency down from
about 7.5ms to about 4ms.
So it seems disabling the write cache across the board would be
advisable in
our case. Is it recommended in general, or
specifically when the DB+WAL
is on
the same hard drive?
Stefan, Mark, are you disabling the write cache on your HDDs by default?
Cheers,
--
Ben
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io