On 6 Feb 2020, at 11:23, Stefan Kooman
<stefan(a)bit.nl> wrote:
Hi!
I've confirmed that the write IO to the metadata pool is coming form active MDSes.
I'm experiencing very poor write performance on clients and I would like to see if
there's anything I can do to optimise the performance.
Right now, I'm specifically focussing on speeding up this use case:
In CephFS mounted dir:
$ time unzip -q wordpress-seo.12.9.1.zip
real 0m47.596s
user 0m0.218s
sys 0m0.157s
On RBD mount:
$ time unzip -q wordpress-seo.12.9.1.zip
real 0m0.176s
user 0m0.131s
sys 0m0.045s
The difference is just too big. I'm having real trouble finding a good reference to
check my setup for bad configuration etc.
I have network bandwidth, RAM and CPU to spare, but I'm unsure on how to put it to
work to help my case.
Are there a lot of directories to be created from that zip file? I think
it boils down to the directory operations that need to be performed
synchrously. See
https://fosdem.org/2020/schedule/event/sds_ceph_async_directory_ops/
https://fosdem.org/2020/schedule/event/sds_ceph_async_directory_ops/attachm…
https://video.fosdem.org/2020/H.1308/sds_ceph_async_directory_ops.webm
Hi!
Last Friday, I did a round of updates that were pending and planned for installation.
After the updates, all server and client components were running their latest version and
systems were rebooted to latest kernel versions.
Ceph: Mimic
Kernel: 5.3 (Ubuntu HWE)
The write IO to the metadata pool is down by a factor of 10 and performance seems much
improved.
Though this does not give me a lot of intel on what the problem was, I'm glad that it
is now pretty much resolved ;)
Before the updates, I was running different (minor) versions of Ceph and kernel clients.
This may have not been ideal, but I'm not sure on details of possible issues with
that.
Rebooting everything may have also eliminated some issues. I did not have the opportunity
to do much analysis on that, since I was working in a production environment.
Well, maybe some of you have extra insights. I'm happy to close this issue and will be
monitoring and recording related info in case this happens again.
Thanks much for your inputs, and have a good week,
Samy