On 25/02/2021 11:19, Dylan McCulloch wrote:
Simon Oosthoek wrote:
On 24/02/2021 22:28, Patrick Donnelly wrote:
> Hello Simon,
>
> On Wed, Feb 24, 2021 at 7:43 AM Simon Oosthoek
<s.oosthoek(a)science.ru.nl> wrote:
>
> On 24/02/2021 12:40, Simon Oosthoek wrote:
> Hi
>
> we've been running our Ceph cluster for nearly 2 years now (Nautilus)
> and recently, due to a temporary situation the cluster is at 80% full.
>
> We are only using CephFS on the cluster.
>
> Normally, I realize we should be adding OSD nodes, but this is a
> temporary situation, and I expect the cluster to go to <60% full
quite
soon.
Anyway, we are noticing some really problematic slowdowns. There are
some things that could be related but we are unsure...
- Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
but are not using more than 2GB, this looks either very inefficient, or
wrong ;-)
After looking at our monitoring history, it seems the mds cache is
actually used more fully, but most of our servers are getting a weekly
reboot by default. This clears the mds cache obviously. I wonder if
that's a smart idea for an MDS node...? ;-)
No, it's not. Can you also check that you do not have mds_cache_size
configured, perhaps on the MDS local ceph.conf?
Hi Patrick,
I've already changed the reboot period to 1 month.
The mds_cache_size is not configured locally in the /etc/ceph/ceph.conf
file, so I guess it's just the weekly reboot that cleared the memory of
cache data...
I'm starting to think that a full ceph cluster could probably be the
only cause of performance problems. Though I don't know why that would be.
Did the performance issue only arise when OSDs in the cluster reached
80% usage? What is your osd nearfull_ratio?
$ ceph osd dump | grep ratio
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
Is the cluster in HEALTH_WARN with nearfull OSDs?
]# ceph -s
cluster:
id: b489547c-ba50-4745-a914-23eb78e0e5dc
health: HEALTH_WARN
2 pgs not deep-scrubbed in time
957 pgs not scrubbed in time
services:
mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 7d)
mgr: cephmon3(active, since 2M), standbys: cephmon1, cephmon2
mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
osd: 168 osds: 168 up (since 11w), 168 in (since 9M); 43 remapped pgs
task status:
scrub status:
mds.cephmds2: idle
data:
pools: 10 pools, 5280 pgs
objects: 587.71M objects, 804 TiB
usage: 1.4 PiB used, 396 TiB / 1.8 PiB avail
pgs: 9634168/5101965463 objects misplaced (0.189%)
5232 active+clean
29 active+remapped+backfill_wait
14 active+remapped+backfilling
5 active+clean+scrubbing+deep+repair
io:
client: 136 MiB/s rd, 600 MiB/s wr, 386 op/s rd, 359 op/s wr
recovery: 328 MiB/s, 169 objects/s
We noticed recently when one of our clusters had
nearfull OSDs that
cephfs client performance was heavily impacted.
Our cluster is nautilus 14.2.15. Clients are kernel 4.19.154.
We determined that it was most likely due to the ceph client forcing
sync file writes when nearfull flag is present.
https://github.com/ceph/ceph-client/commit/7614209736fbc4927584d4387faade4f…
Increasing and decreasing the nearfull ratio confirmed that performance
was only impacted while the nearfull flag was present.
Not sure if that's relevant for your case.
I think this could be very similar in our cluster, thanks for sharing
your insights!
Cheers
/Simon