Hi all,
I did a quick test with wcache off[1]. And have the
impression the
simple rados bench of 2 minutes performed a bit worse on my slow hdd's.
This probably depends on whether or not the drive actually has non-volatile write cache. I
noticed that from many vendors you can buy the seemingly exact same drive for a difference
of something like 20$. My best bet is, that the slightly more expensive ones have
functioning power loss protection hardware that passed the quality test and is disabled in
the cheaper drives (probably among other things). Going for the cheapest version all the
time can have its price.
For the disks we are using, my impression is that disabling volatile write cache actually
adds the volatile cache capacity to the non-volatile write cache. The disks start
consuming more power, but also perform better with ceph.
For our HDDs I have never seen a degradation, fortunately - or one could say that maybe
they are so crappy that it couldn't get any worse :). In case our vendor reads this,
this was a practical joke :)
The main question here is, do you want to risk data loss on power loss? Ceph is extremely
sensitive to data that was acknowledged as "on disk" by the firmware to
disappear after power outage. This is different to journaled file systems like ext4, which
manage to roll back to an earlier consistent version. One looses data but the fs is not
damaged. Xfs has still problems with that though. With ceph you can loose entire pools
without a viable recovery option as was described earlier in this thread.
Couldn't we just set (uncomment)
write_cache = off
in /etc/hdparm.conf?
I was pondering with that. The problem is, that on Centos systems it seems to be ignored,
in general it does not apply to SAS drives, for example, and that it has no working way of
configuring which drives to exclude.
For example, while for data disks for ceph we have certain minimum requirements, like
functioning power loss protection, for an OS boot drive I really don't care. Power
outages on cheap drives that loose writes has not been a problem since ext4. A few log
entries or contents of swap - who cares. Here, performance is more important than data
security on power loss.
I would require a configurable option that works in the same way for all types of
protocols, SATA, SAS, NVMe disks, you name it. At time of writing, I don't know of
any.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Marc Roos <M.Roos(a)f1-outsourcing.eu>
Sent: 25 June 2020 00:01:51
To: paul.emmerich; vitalif
Cc: bknecht; ceph-users; s.priebe
Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs
I did a quick test with wcache off[1]. And have the impression the
simple rados bench of 2 minutes performed a bit worse on my slow hdd's.
[1]
IFS=$'\n' && for line in `mount | grep 'osd/ceph'| awk '{print
$1"
"$3}'| sed -e 's/1 / /' -e 's#/var/lib/ceph/osd/ceph-##'`;do
IFS=' '
arr=($line); service ceph-osd@${arr[1]} stop && smartctl -s wcache,off
${arr[0]} && service ceph-osd@${arr[1]} start ;done
-----Original Message-----
To: Paul Emmerich
Cc: Benoît Knecht; s.priebe(a)profihost.ag; ceph-users(a)ceph.io
Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba
MG07ACA14TE HDDs
Hi,
https://yourcmc.ru/wiki/Ceph_performance author here %)
Disabling write cache is REALLY bad for SSDs without capacitors
[consumer SSDs], also it's bad for HDDs with firmwares that don't have
this bug-o-feature. The bug is really common though. I have no idea
where it comes from, but it's really common. When you "disable" the
write cache you actually "enable" the non-volatile write cache on those
drives. Seagate EXOS drives also behave like that... It seems most EXOS
drives have an SSD cache even though it's not mentioned in specs. And it
gets enabled when you do hdparm -W 0. In theory hdparm -W 0 may hurt
linear write performance even on those HDDs, though.
Well, what I was saying was "does it hurt to
unconditionally run
hdparm -W 0 on all disks?"
Which disk would suffer from this? I haven't seen any disk where this
would be a bad idea
Paul
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io