Hi.
How do I find out if the MDS is "busy" - being the one limiting CephFS
metadata throughput. (12.2.8).
$ time find . | wc -l
1918069
real 8m43.008s
user 0m2.689s
sys 0m7.818s
or 3.667ms per file.
In the light of "potentially batching" and a network latency of ~0.20ms to
the MDS - I have a feeling that this could be significantly improved.
Then I additionally tried to do the same through the NFS -ganesha gateway.
For reference:
Same - but on "local DAS - xfs".
$ time find . | wc -l
1918061
real 0m4.848s
user 0m2.360s
sys 0m2.816s
Same but "above local DAS over NFS":
$ time find . | wc -l
1918061
real 5m56.546s
user 0m2.903s
sys 0m34.381s
jk@ceph-mon1:~$ sudo ceph fs status
[sudo] password for jk:
cephfs - 84 clients
======
+------+----------------+-----------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+----------------+-----------+---------------+-------+-------+
| 0 | active | ceph-mds2 | Reqs: 1369 /s | 11.3M | 11.3M |
| 0-s | standby-replay | ceph-mds1 | Evts: 0 /s | 0 | 0 |
+------+----------------+-----------+---------------+-------+-------+
+------------------+----------+-------+-------+
| Pool | type | used | avail |
+------------------+----------+-------+-------+
| cephfs_metadata | metadata | 226M | 16.4T |
| cephfs_data | data | 164T | 132T |
| cephfs_data_ec42 | data | 180T | 265T |
+------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 12.2.5-45redhat1xenial
(d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)
How can we asses where the bottleneck is and what to do to speed it up?