[ceph-users] Ceph MDS - busy?

30 Apr 2020

Hi.

How do I find out if the MDS is "busy" - being the one limiting CephFS
metadata throughput. (12.2.8).

$ time find . | wc -l
1918069

real    8m43.008s
user    0m2.689s
sys     0m7.818s

or 3.667ms per file.
In the light of "potentially batching" and a network latency of ~0.20ms to
the MDS - I have a feeling that this could be significantly improved.

Then I additionally tried to do the same through the NFS -ganesha gateway.

For reference:
Same - but on "local DAS - xfs".
$ time find . | wc -l
1918061

real    0m4.848s
user    0m2.360s
sys     0m2.816s

Same but "above local DAS over NFS":
$ time find . | wc -l
1918061

real    5m56.546s
user    0m2.903s
sys     0m34.381s

jk@ceph-mon1:~$ sudo ceph fs status
[sudo] password for jk:
cephfs - 84 clients
======
+------+----------------+-----------+---------------+-------+-------+
| Rank |     State      |    MDS    |    Activity   |  dns  |  inos |
+------+----------------+-----------+---------------+-------+-------+
|  0   |     active     | ceph-mds2 | Reqs: 1369 /s | 11.3M | 11.3M |
| 0-s  | standby-replay | ceph-mds1 | Evts:    0 /s |    0  |    0  |
+------+----------------+-----------+---------------+-------+-------+
+------------------+----------+-------+-------+
|       Pool       |   type   |  used | avail |
+------------------+----------+-------+-------+
| cephfs_metadata  | metadata |  226M | 16.4T |
|   cephfs_data    |   data   |  164T |  132T |
| cephfs_data_ec42 |   data   |  180T |  265T |
+------------------+----------+-------+-------+

+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 12.2.5-45redhat1xenial
(d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)

How can we asses where the bottleneck is and what to do to speed it up?

2024

2023

2022

2021

2020

2019

[ceph-users] Ceph MDS - busy?