November 2020 - ceph-users

by Abhishek Lekshmanan

This is the 13th backport release in the Nautilus series. This release fixes a regression introduced in v14.2.12, and a few ceph-volume & RGW fixes. We recommend users to update to this release. Notable Changes --------------- * Fixed a regression that caused breakage in clusters that referred to ceph-mon hosts using dns names instead of ip addresses in the `mon_host` param in `ceph.conf` (issue#47951) * ceph-volume: the ``lvm batch`` subcommand received a major rewrite Changelog --------- * ceph-volume: major batch refactor (pr#37522, Jan Fajerski) * mgr/dashboard: Proper format iSCSI target portals (pr#37060, Volker Theile) * rpm: move python-enum34 into rhel 7 conditional (pr#37747, Nathan Cutler) * mon/MonMap: fix unconditional failure for init_with_hosts (pr#37816, Nathan Cutler, Patrick Donnelly) * rgw: allow rgw-orphan-list to note when rados objects are in namespace (pr#37799, J. Eric Ivancich) * rgw: fix setting of namespace in ordered and unordered bucket listing (pr#37798, J. Eric Ivancich) -- Abhishek

3 years, 5 months

1
0
0 0

Re: Seriously degraded performance after update to Octopus

by Vladimir Prokofev

Just shooting in the dark here, but you may be affected by similar issue I had a while back, it was discussed here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ZOPBOY6XQOY… In short - they've changed setting bluefs_buffered_io to false in the recent Nautilus release. I guess the same was applied to newer releases. That lead to severe performance issues and similar symptoms, i.e. lower memory usage on OSD nodes. Worth checking out. Of course, it may be something completely different. You should look into monitoring all your OSDs separately, checking their utilization, await, and other parameters, at the same time comparing them to pre-upgrade values, to find the root cause. пн, 2 нояб. 2020 г. в 11:55, Marc Roos <M.Roos(a)f1-outsourcing.eu>: > > I am advocating already a long time for publishing testing data of some > basic test cluster against different ceph releases. Just a basic ceph > cluster that covers most configs and run the same tests, so you can > compare just ceph performance. That would mean a lot for smaller > companies that do not have access to a good test environment. I have > asked also about this at some ceph seminar. > > > > -----Original Message----- > From: Martin Rasmus Lundquist Hansen [mailto:hansen@imada.sdu.dk] > Sent: Monday, November 02, 2020 7:53 AM > To: ceph-users(a)ceph.io > Subject: [ceph-users] Seriously degraded performance after update to > Octopus > > Two weeks ago we updated our Ceph cluster from Nautilus (14.2.0) to > Octopus (15.2.5), an update that was long overdue. We used the Ansible > playbooks to perform a rolling update and except from a few minor > problems with the Ansible code, the update went well. The Ansible > playbooks were also used for setting up the cluster in the first place. > Before updating the Ceph software we also performed a full update of > CentOS and the Linux kernel (this part of the update had already been > tested on one of the OSD nodes the week before and we didn't notice any > problems). > > However, after the update we are seeing a serious decrease in > performance, more than a factor of 10x in some cases. I spend a week > trying to come up with an explantion or solution, but I am completely > blank. Independently of Ceph I tested the network performance and the > performance of the OSD disks, and I am not really seeing any problems > here. > > The specifications of the cluster is: > - 3x Monitor nodes running mgr+mon+mds (Intel(R) Xeon(R) Silver 4108 CPU > @ 1.80GHz, 16 cores, 196 GB RAM) > - 14x OSD nodes, each with 18 HDDs and 1 NVME (Intel(R) Xeon(R) Gold > 6126 CPU @ 2.60GHz, 24 cores, 384 GB RAM) > - CentOS 7.8 and Kernel 5.4.51 > - 100 Gbps Infiniband > > We are collecting various metrics using Prometheus, and on the OSD nodes > we are seeing some clear differences when it comes to CPU and Memory > usage. I collected some graphs here: http://mitsted.dk/ceph . After the > update the system load is highly reduced, there is almost no longer any > iowait for the CPU, and the free memory is no longer used for Buffers (I > can confirm that the changes in these metrics are not due to the update > of CentOS or the Linux kernel). All in all, now the OSD nodes are almost > completely idle all the time (and so are the monitors). On the linked > page I also attached two RADOS benchmarks. The first benchmark was > performed when the cluster was initially configured, and the second is > the same benchmark after the update to Octopus. When comparing these > two, it is clear that the performance has changed dramatically. For > example, in the write test the bandwidth is reduced from 320 MB/s to 21 > MB/s and the number of IOPS has also dropped significantly. > > I temporarily tried to disable the firewall and SELinux on all nodes to > see if it made any difference, but it didnt look like it (I did not > restart any services during this test, I am not sure if that could be > necessary). > > Any suggestions for finding the root cause of this performance decrease > would be greatly appreciated. > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > email to ceph-users-leave(a)ceph.io > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

3 years, 5 months

1
0
1 0

cephfs cannot write

by Patrick

Hi all, My ceph cluster is HEALTH_OK, but I cannot write on cephfs. OS: Ubuntu 20.04, ceph version 15.2.5, deploy with cephadm. root@RK01-OSD-A001:~# ceph -s   cluster:     id:     9091b472-1bdb-11eb-b217-abff3468259e     health: HEALTH_OK     services:     mon: 3 daemons, quorum RK01-OSD-A001,RK02-OSD-A002,RK03-OSD-A003 (age 18s)     mgr: RK01-OSD-A001.jwrjgj(active, since 51m), standbys: RK03-OSD-A003.tulrii     mds: cephfs:1 {0=cephfs.RK02-OSD-A002.lwpgaw=up:active} 1 up:standby     osd: 6 osds: 6 up (since 44m), 6 in (since 44m)     task status:     scrub status:         mds.cephfs.RK02-OSD-A002.lwpgaw: idle     data:     pools:   3 pools, 65 pgs     objects: 24 objects, 67 KiB     usage:   6.0 GiB used, 44 TiB / 44 TiB avail     pgs:     65 active+clean root@RK01-OSD-A001:~# ceph fs status cephfs - 1 clients ====== RANK  STATE               MDS                 ACTIVITY     DNS    INOS    0    active  cephfs.RK02-OSD-A002.lwpgaw  Reqs:    0 /s    13     15           POOL           TYPE     USED  AVAIL   cephfs.cephfs.meta  metadata  1152k  20.7T   cephfs.cephfs.data    data       0   20.7T           STANDBY MDS           cephfs.RK03-OSD-A003.xchwqj   MDS version: ceph version 15.2.5 (2c93eff00150f0cc5f106a559557a58d3d7b6f1f) octopus (stable) root@RK05-FRP-A001:~# df -h|grep "ceph-test" 172.16.65.1,172.16.65.2,172.16.65.3:6789:/   21T     0   21T   0% /ceph-test root@RK05-FRP-A001:~# echo 123 > /ceph-test/1.txt -bash: echo: write error: Operation not permitted root@RK05-FRP-A001:~# ls -l /ceph-test/1.txt -rw-r--r-- 1 root root 0 Nov  1 09:40 /ceph-test/1.txt root@RK05-FRP-A001:~# ls -ld /ceph-test/ drwxr-xr-x 2 root root 1 Nov  1 09:40 /ceph-test/ root@RK01-OSD-A001:~# cd /var/log/ceph/`ceph fsid` root@RK01-OSD-A001:/var/log/ceph/9091b472-1bdb-11eb-b217-abff3468259e# cat ceph-volume.log | grep err | grep sdx [2020-11-01 08:53:51,384][ceph_volume.process][INFO  ] stderr Failed to find physical volume "/dev/sdx". [2020-11-01 08:53:51,417][ceph_volume.process][INFO  ] stderr unable to read label for /dev/sdx: (2) No such file or directory [2020-11-01 08:53:51,445][ceph_volume.process][INFO  ] stderr unable to read label for /dev/sdx: (2) No such file or directory root@RK01-OSD-A001:~# pvs|grep sdx   /dev/sdx   ceph-41b09a52-e44b-43c5-ad86-0eada11b48b6 lvm2 a--  <7.28t    0  root@RK01-OSD-A001:~# lsblk|grep sdx sdx                                                                                                   65:112  0   7.3T  0 disk  root@RK01-OSD-A001:~# parted -s /dev/sdx print Error: /dev/sdx: unrecognised disk label Model: LSI MR9261-8i (scsi) Disk /dev/sdx: 8001GB Sector size (logical/physical): 512B/4096B Partition Table: unknown Disk Flags:  root@RK01-OSD-A001:~# 

3 years, 5 months

2
1
0 0

RGW seems to not clean up after some requests

by Denis Krienbühl

Hi everyone We have faced some RGW outages recently, with the RGW returning HTTP 503. First for a few, then for most, then all requests - in the course of 1-2 hours. This seems to have started since we have updated from 15.2.4 to 15.2.5. The line that accompanies these outages in the log is the following: s3:list_bucket Scheduling request failed with -2218 It first pops up a few times here and there, until it eventually applies to all requests. It seems to indicate that the throttler has reached the limit of open connections. As we run a pair of HAProxy instances in front of RGW, which limit the number of connections to the two RGW instances to 400, this limit should never be reached. We do use RGW metadata sync between the instances, which could account for some extra connections, but if I look at open TCP connections between the instances I can count no more than 20 at any given time. I also noticed that some connections in the RGW log seem to never complete. That is, I can find a ‘starting new request’ line, but no associated ‘req done’ or ‘beast’ line. I don’t think there are any hung connections around, as they are killed by HAProxy after a short timeout. Looking at the code, it seems as if the throttler in use (SimpleThrottler), eventually reaches the maximum count of 1024 connections (outstanding_requests), and never recovers. I believe that the request_complete function is not called in all cases, but I am not familiar with the Ceph codebase, so I am not sure. See https://github.com/ceph/ceph/blob/cc17681b478594aa39dd80437256a54e388432f0/… <https://github.com/ceph/ceph/blob/cc17681b478594aa39dd80437256a54e388432f0/…> Does anyone see the same phenomenon? Could this be a bug in the request handling of RGW, or am I wrong in my assumptions? For now we’re just restarting our RGWs regularly, which seems to keep the problem at bay. Thanks for any hints. Denis

3 years, 5 months

2
2
0 0

pgs stuck backfill_toofull

by Mark Johnson

I've been struggling with this one for a few days now. We had an OSD report as near full a few days ago. Had this happen a couple of times before and a reweight-by-utilization has sorted it out in the past. Tried the same again but this time we ended up with a couple of pgs in a state of backfill_toofull and a handful of misplaced objects as a result. Tried doing the reweight a few more times and it's been moving data around. We did have another osd trigger the near full alert but running the reweight a couple more times seems to have moved some of that data around a bit better. However, the original near_full osd doesn't seem to have changed much and the backfill_toofull pgs are still there. I'd keep doing the reweight-by-utilization but I'm not sure if I'm heading down the right path and if it will eventually sort it out. We have 14 pools, but the vast majority of data resides in just one of those pools (pool 20). The pgs in the backfill state are in pool 2 (as far as I can tell). That particular pool is used for some cephfs stuff and has a handful of large files in there (not sure if this is significant to the problem). All up, our utilization is showing as 55.13% but some of our OSDs are showing as 76% in use with this one problem sitting at 85.02%. Right now, I'm just not sure what the proper corrective action is. The last couple of reweights I've run have been a bit more targetted in that I've set it to only function on two OSDs at a time. If I run a test-reweight targetting only one osd, it does say it will reweight OSD 9 (the one at 85.02%). I gather this will move data away from this OSD and potentially get it below the threshold. However, at one point in the past couple of days, it's shown as no OSDs in a near full state, yet the two pgs in backfill_toofull didn't change. So, that's why I'm not sure continually reweighting is going to solve this issue. I'm a long way from knowledgable on Ceph so I'm not really sure what information is useful here. Here's a bit of info on what I'm seeing. Can provide anything else that might help. Basically, we have a three node cluster but only two have OSDs. The third is there simply to enable a quorum to be established. The OSDs are evenly spread across these two needs and the configuration of each is identical. We are running Jewel and are not in a position to upgrade at this stage. # ceph --version ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e) # ceph health detail HEALTH_WARN 2 pgs backfill_toofull; 2 pgs stuck unclean; recovery 33/62099566 objects misplaced (0.000%); 1 near full osd(s) pg 2.52 is stuck unclean for 201822.031280, current state active+remapped+backfill_toofull, last acting [17,3] pg 2.18 is stuck unclean for 202114.617682, current state active+remapped+backfill_toofull, last acting [18,2] pg 2.18 is active+remapped+backfill_toofull, acting [18,2] pg 2.52 is active+remapped+backfill_toofull, acting [17,3] recovery 33/62099566 objects misplaced (0.000%) osd.9 is near full at 85% # ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 2 1.37790 1.00000 1410G 842G 496G 59.75 1.08 33 3 1.37790 0.45013 1410G 1079G 259G 76.49 1.39 21 4 1.37790 0.95001 1410G 1086G 253G 76.98 1.40 44 5 1.37790 1.00000 1410G 617G 722G 43.74 0.79 43 6 1.37790 0.65009 1410G 616G 722G 43.69 0.79 39 7 1.37790 0.95001 1410G 495G 844G 35.10 0.64 40 8 1.37790 1.00000 1410G 732G 606G 51.93 0.94 52 9 1.37790 0.70007 1410G 1199G 139G 85.02 1.54 37 10 1.37790 1.00000 1410G 611G 727G 43.35 0.79 41 11 1.37790 0.75006 1410G 495G 843G 35.11 0.64 32 0 1.37790 1.00000 1410G 731G 608G 51.82 0.94 43 12 1.37790 1.00000 1410G 851G 487G 60.36 1.09 44 13 1.37790 1.00000 1410G 378G 960G 26.82 0.49 38 14 1.37790 1.00000 1410G 969G 370G 68.68 1.25 37 15 1.37790 1.00000 1410G 724G 614G 51.35 0.93 35 16 1.37790 1.00000 1410G 491G 847G 34.84 0.63 43 17 1.37790 1.00000 1410G 862G 476G 61.16 1.11 50 18 1.37790 0.80005 1410G 1083G 255G 76.78 1.39 26 19 1.37790 0.65009 1410G 963G 375G 68.29 1.24 23 20 1.37790 1.00000 1410G 724G 614G 51.38 0.93 42 TOTAL 28219G 15557G 11227G 55.13 MIN/MAX VAR: 0.49/1.54 STDDEV: 15.57 # ceph pg ls backfill_toofull pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 2.18 9 0 0 18 0 0 3653 3653 active+remapped+backfill_toofull 2020-10-29 05:31:20.429912 610'549153 656:390372 [9,12] 9 [18,2] 18 594'547482 2020-10-25 20:28:39.680744 594'543841 2020-10-21 21:21:33.092868 2.52 15 0 0 15 0 0 4883 4883 active+remapped+backfill_toofull 2020-10-29 05:31:28.277898 652'502085 656:367288 [17,9] 17 [17,3] 17 594'499108 2020-10-26 11:06:48.417825 594'499108 2020-10-26 11:06:48.417825 pool : 17 18 19 11 20 21 12 13 0 14 1 15 2 16 | SUM -------------------------------------------------------------------------------------------------------------------------------- osd.4 3 0 0 0 9 2 0 0 12 1 9 0 7 1 | 44 osd.17 1 0 0 0 7 3 1 0 8 1 17 1 11 0 | 50 osd.18 0 0 0 0 9 0 0 0 4 0 7 0 5 0 | 25 osd.5 0 0 0 2 5 1 1 0 5 0 16 0 11 2 | 43 osd.6 0 1 0 1 5 2 0 0 9 0 13 1 7 0 | 39 osd.19 0 0 1 0 8 2 0 1 2 0 6 0 3 0 | 23 osd.7 0 0 0 0 4 1 1 0 3 0 12 0 19 0 | 40 osd.8 0 1 0 0 6 3 0 2 10 1 13 1 15 0 | 52 osd.9 1 0 2 0 10 2 0 0 4 1 6 1 10 0 | 37 osd.10 0 0 1 1 5 2 0 1 7 0 12 0 11 1 | 41 osd.20 1 0 0 0 6 1 0 1 7 0 8 1 17 0 | 42 osd.11 0 0 0 0 4 1 1 1 5 0 11 0 9 0 | 32 osd.12 0 0 1 1 7 1 0 0 5 1 12 1 14 1 | 44 osd.13 0 2 0 0 3 1 0 0 10 1 11 0 10 0 | 38 osd.0 0 1 0 1 6 3 0 1 7 0 11 0 13 0 | 43 osd.14 1 0 0 0 8 1 1 0 4 1 12 0 9 0 | 37 osd.15 1 0 2 1 6 1 1 0 8 0 7 0 6 2 | 35 osd.2 0 2 1 0 7 2 1 0 7 1 4 1 6 0 | 32 osd.3 0 0 0 0 9 0 0 0 2 0 4 0 5 0 | 20 osd.16 0 1 0 1 4 3 1 1 9 0 9 1 12 1 | 43 -------------------------------------------------------------------------------------------------------------------------------- SUM : 8 8 8 8 128 32 8 8 128 8 200 8 200 8 |

3 years, 5 months

4
8
0 0

Fix PGs states

by Ing. Luis Felipe Domínguez Vega

Hi: I have this ceph status: ----------------------------------------------------------------------------- cluster: id: 039bf268-b5a6-11e9-bbb7-d06726ca4a78 health: HEALTH_WARN noout flag(s) set 1 osds down Reduced data availability: 191 pgs inactive, 2 pgs down, 35 pgs incomplete, 290 pgs stale 5 pgs not deep-scrubbed in time 7 pgs not scrubbed in time 327 slow ops, oldest one blocked for 233398 sec, daemons [osd.12,osd.36,osd.5] have slow ops. services: mon: 1 daemons, quorum fond-beagle (age 23h) mgr: fond-beagle(active, since 7h) osd: 48 osds: 45 up (since 95s), 46 in (since 8h); 4 remapped pgs flags noout data: pools: 7 pools, 2305 pgs objects: 350.37k objects, 1.5 TiB usage: 3.0 TiB used, 38 TiB / 41 TiB avail pgs: 6.681% pgs unknown 1.605% pgs not active 1835 active+clean 279 stale+active+clean 154 unknown 22 incomplete 10 stale+incomplete 2 down 2 remapped+incomplete 1 stale+remapped+incomplete -------------------------------------------------------------------------------------------- How can i fix all of unknown, incomplete, remmaped+incomplete, etc... i dont care if i need remove PGs

3 years, 5 months

5
9
0 0

Intel SSD firmware guys contacts, if any

by vitalif＠yourcmc.ru

Hi! I have an interesting question regarding SSDs and I'll try to ask about it here. During my testing of Ceph & Vitastor & Linstor on servers equipped with Intel D3-4510 SSDs I discovered a very funny problem with these SSDs: They don't like overwrites of the same sector. That is, if you overwrite the same sector over and over again you get very low iops: $ fio -direct=1 -rw=write -bs=4k -size=4k -loops=100000 -iodepth=1 write: IOPS=3142, BW=12.3MiB/s (12.9MB/s)(97.9MiB/7977msec) And if you overwrite at least ~128k of other sectors between overwriting the same sector you get normal results: $ fio -direct=1 -rw=write -bs=4k -size=128k -loops=100000 -iodepth=1 write: IOPS=20.8k, BW=81.4MiB/s (85.3MB/s)(543MiB/6675msec) This slowdown almost doesn't hurt Ceph, slightly hurts Vitastor (the impact was greater before I added a fix), and MASSIVELY hurts Linstor/DRBD9 because of its "bitmap". By now I've only seen it on this particular model of SSD. For example, Intel P4500, Micron 9300 Pro, Samsung PM983 don't have this issue. Do you have any contacts of Intel SSD firmware guys to ask them about this bug-o-feature? :-) -- With best regards, Vitaliy Filippov

3 years, 5 months

1
0
0 0

how to rbd export image from group snap?

by Timo Weingärtner

Hi, we're using rbd for VM disk images and want to make consistent backups of groups of them. I know I can create a group and make consistent snapshots of all of them: # rbd --version ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) # rbd create test_foo --size 1M # rbd create test_bar --size 1M # rbd group create test # rbd group image add test test_foo # rbd group image add test test_bar # rbd group snap create test@1 But how can I export the individual image snapshots? I tried different ways of addressing them, but nothing worked: # rbd export test_foo@1 - error setting snapshot context: (2) No such file or directory # rbd export test_foo@test/1 - rbd: error opening pool 'test_foo@test': (2) No such file or directory # rbd export rbd/test_foo@test/1 - error setting snapshot context: (2) No such file or directory # rbd export test@1/test_foo - rbd: error opening pool 'test@1': (2) No such file or directory # rbd export rbd/test@1/test_foo - rbd: error opening image test: (2) No such file or directory Am I missing something? Mit freundlichen Grüßen, Timo Weingärtner Systemadministrator -- ITscope GmbH Ludwig-Erhard-Allee 20 D-76131 Karlsruhe Tel: +49 721 62737637 Fax: +49 721 66499175 https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger

3 years, 5 months

1
0
0 0

14.2.12 breaks mon_host pointing to Round Robin DNS entry

by Wido den Hollander

Hi, I already submitted a ticket: https://tracker.ceph.com/issues/47951 Maybe other people noticed this as well. Situation: - Cluster is running IPv6 - mon_host is set to a DNS entry - DNS entry is a Round Robin with three AAAA-records root@wido-standard-benchmark:~# ceph -s unable to parse addrs in 'mon.objects.xx.xxx.net' [errno 22] error connecting to the cluster root@wido-standard-benchmark:~# The relevant part of the ceph.conf: [global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx mon_host = mon.objects.xxx.xxx.xxx ms_bind_ipv6 = true This works fine with 14.2.11 and breaks under 14.2.12 Anybody else seeing this as well? Wido

3 years, 5 months

5
5
0 0

Seriously degraded performance after update to Octopus

by Martin Rasmus Lundquist Hansen

Two weeks ago we updated our Ceph cluster from Nautilus (14.2.0) to Octopus (15.2.5), an update that was long overdue. We used the Ansible playbooks to perform a rolling update and except from a few minor problems with the Ansible code, the update went well. The Ansible playbooks were also used for setting up the cluster in the first place. Before updating the Ceph software we also performed a full update of CentOS and the Linux kernel (this part of the update had already been tested on one of the OSD nodes the week before and we didn't notice any problems). However, after the update we are seeing a serious decrease in performance, more than a factor of 10x in some cases. I spend a week trying to come up with an explantion or solution, but I am completely blank. Independently of Ceph I tested the network performance and the performance of the OSD disks, and I am not really seeing any problems here. The specifications of the cluster is: - 3x Monitor nodes running mgr+mon+mds (Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz, 16 cores, 196 GB RAM) - 14x OSD nodes, each with 18 HDDs and 1 NVME (Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz, 24 cores, 384 GB RAM) - CentOS 7.8 and Kernel 5.4.51 - 100 Gbps Infiniband We are collecting various metrics using Prometheus, and on the OSD nodes we are seeing some clear differences when it comes to CPU and Memory usage. I collected some graphs here: http://mitsted.dk/ceph . After the update the system load is highly reduced, there is almost no longer any iowait for the CPU, and the free memory is no longer used for Buffers (I can confirm that the changes in these metrics are not due to the update of CentOS or the Linux kernel). All in all, now the OSD nodes are almost completely idle all the time (and so are the monitors). On the linked page I also attached two RADOS benchmarks. The first benchmark was performed when the cluster was initially configured, and the second is the same benchmark after the update to Octopus. When comparing these two, it is clear that the performance has changed dramatically. For example, in the write test the bandwidth is reduced from 320 MB/s to 21 MB/s and the number of IOPS has also dropped significantly. I temporarily tried to disable the firewall and SELinux on all nodes to see if it made any difference, but it didn’t look like it (I did not restart any services during this test, I am not sure if that could be necessary). Any suggestions for finding the root cause of this performance decrease would be greatly appreciated.

3 years, 5 months

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users November 2020