ceph-users October 2023

ceph-users@ceph.io

137 participants
122 discussions

Performance drop and retransmits with CephFS

by Tom Wezepoel

Hi all, Have a question regarding CephFS and write performance. Possibly I am overlooking a setting. We recently started using Ceph, where we want to use CephFS as a shared storage system for a Sync-and-Share solution. Now we are still in a testing phase, where we are also mainly looking at the performance of the system, where we are seeing some strange issues. We are using Ceph Quincy release 17.2.6, with a replica 3 data policy across 21 hosts spread across 3 locations. When I write multiple files of 1G, the writing performance drops from 400MiB/s to 18 MiB/s with also multiple retries. However, when I empty the page caches every minute on the client, the performance remains good. But that's not really a solution of course. Have already played a lot with the sysctl settings, like vm.dirty etc, but it makes no difference at all. When I enable the fuse_disable_pagecache, the write performance does stay reasonable at 70MiB/s, but the read performance completely collapses from 600 MiB/s to 40 MiB/s There is no difference in behavior between the kernel or fuse client. Have already played around with client_oc_max_dirty, client_oc_max_objects, client_oc_size , etc. But haven't found the right setting. Anyone familiar with this who can give me some hints? Thanks for your help! :-) Kind regards, Tom

7 months, 1 week

MDS failing to respond to capability release while `ls -lR`

by E Taka

Dockerized Ceph 17.2.6 on Ununtu 22.04 The Cephfs filesystem has a size of 180TB, used are only 66TB. When running a `ls -lR` the output stops and all accesses to the directory stall. ceph health says: # ceph health detail HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs report slow requests [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release mds.vol.ppc721.mvxstq(mds.0): Client dessert failing to respond to capability release client_id: 6899709 [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests mds.vol.ppc721.mvxstq(mds.0): 1 slow requests are blocked > 30 secs ceph -w shows: [WRN] slow request 31.421408 seconds old, received at 2023-10-01T09:53:44.634849+0000: client _request(client.7360117:2224947 getattr AsLsFs #0x4000012503f 2023-10-01T09:53:44.631148+0000 caller_uid=0, caller_gid=0{0,}) currently failed to rdlock, waiting [WRN] client.6899709 isn't responding to mclientcaps(revoke), ino 0x4000012503f pending pAsLs XsFsc issued pAsLsXsFscb, sent 61.422148 seconds ago The full output of ceph daemon [mds] dump inode 0x4000012503f, config show, dump_ops_in_flight and "ceph -w" with timestamps can be found on https://gist.github.com/test-erik/5de4a7bd632f62ab58c3115cfb876ae0 Do you have an idea what we can do about this?

7 months, 1 week

2024

2023

2022

2021

2020

2019

ceph-users October 2023