Hi Robert and Paul,
sad news. I did a 5 seconds single thread test after setting osd_op_queue_cut_off=high on
all OSDs and MDSs. Here the current settings:
[root@ceph-01 ~]# ceph config show osd.0
NAME VALUE SOURCE OVERRIDES IGNORES
bluestore_compression_min_blob_size_hdd 262144 file
bluestore_compression_mode aggressive file
cluster_addr 192.168.16.68:0/0 override
cluster_network 192.168.16.0/20 file
crush_location host=c-04-A file
daemonize false override
err_to_syslog true file
keyring $osd_data/keyring default
leveldb_log default
mgr_initial_modules balancer dashboard file
mon_allow_pool_delete false file
mon_pool_quota_crit_threshold 90 file
mon_pool_quota_warn_threshold 70 file
osd_journal_size 4096 file
osd_max_backfills 3 mon
osd_op_queue_cut_off high mon
osd_pool_default_flag_nodelete true file
osd_recovery_max_active 8 mon
osd_recovery_sleep 0.050000 mon
public_addr 192.168.32.68:0/0 override
public_network 192.168.32.0/19 file
rbd_default_features 61 default
setgroup disk cmdline
setuser ceph cmdline
[root@ceph-01 ~]# ceph config get osd.0 osd_op_queue
wpq
Unfortunately, the problem is not resolved. The fio job script is:
=====================
[global]
name=fio-rand-write
filename_format=fio-$jobname-${HOSTNAME}-$jobnum-$filenum
rw=randwrite
bs=4K
numjobs=1
time_based=1
runtime=5
[file1]
size=100G
ioengine=sync
=====================
That's a random write test on a 100G file with write size 4K. Note that fio uses
"direct=0" by default. Using "direct=1" is absolutely fine.
Running this short burst of load, I already get the cluster unhealthy:
cluster log:
2019-09-03 20:00:00.000160 [INF] overall HEALTH_OK
2019-09-03 20:08:36.450527 [WRN] Health check failed: 1 MDSs report slow metadata IOs
(MDS_SLOW_METADATA_IO)
2019-09-03 20:08:59.867124 [INF] MDS health message cleared (mds.0): 2 slow metadata IOs
are blocked > 30 secs, oldest blocked for 49 secs
2019-09-03 20:09:00.373050 [INF] Health check cleared: MDS_SLOW_METADATA_IO (was: 1 MDSs
report slow metadata IOs)
2019-09-03 20:09:00.373094 [INF] Cluster is now healthy
/var/log/messages: loads of these (all OSDs!)
Sep 3 20:08:39 ceph-09 journal: 2019-09-03 20:08:39.269 7f6a3d63c700 -1 osd.161 10411
get_health_metrics reporting 354 slow ops, oldest is osd_op(client.4497435.0:38244 5.f7s0
5:ef9f1be4:::100010ed9bd.0000390c:head [write 8192~4096,write 32768~4096,write
139264~4096,write 172032~4096,write 270336~4096,write 512000~4096,write 688128~4096,write
876544~4096,write 1048576~4096,write 1257472~4096,write 1425408~4096,write
1445888~4096,write 1503232~4096,write 1552384~4096,write 1716224~4096,write 1765376~4096]
snapc 12e=[] ondisk+write+known_if_redirected e10411)
It looks like the MDS is pushing waaaayyy too many requests onto the HDDs instead of
throttling the client.
An ordinary user should not have so much power in his hands. This makes it trivial to
destroy a ceph cluster.
This very short fio test is probably sufficient to reproduce the issue on any test
cluster. Should I open an issue?
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Frank Schilder <frans(a)dtu.dk>
Sent: 30 August 2019 12:56
To: Robert LeBlanc; Paul Emmerich
Cc: ceph-users
Subject: [ceph-users] Re: ceph fs crashes on simple fio test
Hi Robert and Paul,
a quick update. I restarted all OSDs today to activate osd_op_queue_cut_off=high. I run
into a serious problem right after that. The standby-replay MDS daemons started missing
mon beacons and were killed by the mons:
ceph-01 journal: debug [...] log [INF] Standby daemon mds.ceph-12 is not responding,
dropping it
Apparently, one also needs to set this on the MDSes:
ceph config set mds osd_op_queue_cut_off high
This also requires a restart to become active. After that, everything seems to work again.
The question that remains is:
Do I need to change this for any other daemon?
I will repeat the performance tests later and post results. On observation is, that an MDS
fail-over was a factor of 5-10 faster with the cut-off set to high.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io