July 2021 - ceph-users - lists.ceph.io

by Szabo, Istvan (Agoda)

Hi, I’m continuously getting scrub errors in my index pool and log pool that I need to repair always. HEALTH_ERR 2 scrub errors; Possible data damage: 1 pg inconsistent [ERR] OSD_SCRUB_ERRORS: 2 scrub errors [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent pg 20.19 is active+clean+inconsistent, acting [39,41,37] Why is this? I have no cue at all, no log entry no anything ☹ ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

1 year, 3 months

6
9
0 0

v16.2.5 Pacific released

by David Galloway

We're happy to announce the 5th backport release in the Pacific series. We recommend users to update to this release. For a detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/en/news/blog/2021/v16-2-5-pacific-released Notable Changes --------------- * `ceph-mgr-modules-core` debian package does not recommend `ceph-mgr-rook` anymore. As the latter depends on `python3-numpy` which cannot be imported in different Python sub-interpreters multi-times if the version of `python3-numpy` is older than 1.19. Since `apt-get` installs the `Recommends` packages by default, `ceph-mgr-rook` was always installed along with `ceph-mgr` debian package as an indirect dependency. If your workflow depends on this behavior, you might want to install `ceph-mgr-rook` separately. * mgr/nfs: `nfs` module is moved out of volumes plugin. Prior using the `ceph nfs` commands, `nfs` mgr module must be enabled. * volumes/nfs: The `cephfs` cluster type has been removed from the `nfs cluster create` subcommand. Clusters deployed by cephadm can support an NFS export of both `rgw` and `cephfs` from a single NFS cluster instance. * The `nfs cluster update` command has been removed. You can modify the placement of an existing NFS service (and/or its associated ingress service) using `orch ls --export` and `orch apply -i ...`. * The `orch apply nfs` command no longer requires a pool or namespace argument. We strongly encourage users to use the defaults so that the `nfs cluster ls` and related commands will work properly. * The `nfs cluster delete` and `nfs export delete` commands are deprecated and will be removed in a future release. Please use `nfs cluster rm` and `nfs export rm` instead. * A long-standing bug that prevented 32-bit and 64-bit client/server interoperability under msgr v2 has been fixed. In particular, mixing armv7l (armhf) and x86_64 or aarch64 servers in the same cluster now works. Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at https://download.ceph.com/tarballs/ceph-16.2.5.tar.gz * For packages, see https://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 0883bdea7337b95e4b611c768c0279868462204a

2 years, 9 months

3
2
0 0

Cephfs slow, not busy, but doing high traffic in the metadata pool

by Flemming Frandsen

We have a nautilus cluster where any metadata write operation is very slow. We're seeing very light load from clients, as reported by dumping ops in flight, often it's zero. We're also seeing about 100 MB/s writes to the metadata pool, constantly, for weeks on end, which seems excessive, as only 22GB is utilized. Should the writes to the metadata pool not quiet down when there's nothing going on? Is there any way i can get information about why the MDSes are thrashing so badly?

2 years, 9 months

2
3
0 0

RocksDB resharding does not work

by Robert Sander

Hi, I am trying to apply the resharding to a containerized OSD (16.2.4) as described here: https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#r… # ceph osd set noout # ceph orch daemon stop osd.13 # cephadm shell --name osd.13 # ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-13 --sharding="m(3) p(3,0-12) O(3,0-13)=block_cache={type=binned_lru} L P" reshard 2021-07-08T15:12:03.392+0000 7f2e2173b3c0 -1 rocksdb: prepare_for_reshard failure parsing column options: block_cache={type=binned_lru} error resharding: (22) Invalid argument # exit # ceph orch daemon start osd.13 # ceph osd unset noout The OSD cannot start any more and has these errors in its log: Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.293+0000 7efc32db4080 0 osd.13:7.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196) Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.293+0000 7efc32db4080 1 bluestore(/var/lib/ceph/osd/ceph-13) _mount path /var/lib/ceph/osd/ceph-13 Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.293+0000 7efc32db4080 0 bluestore(/var/lib/ceph/osd/ceph-13) _open_db_and_around read-only:0 repair:0 Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.293+0000 7efc32db4080 1 bdev(0x564271210800 /var/lib/ceph/osd/ceph-13/block) open path /var/lib/ceph/osd/ceph-13/block Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.293+0000 7efc32db4080 1 bdev(0x564271210800 /var/lib/ceph/osd/ceph-13/block) open size 107369988096 (0x18ffc00000, 100 GiB) block_size 4096 (4 KiB) non-rotational discard supported Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.293+0000 7efc32db4080 1 bluestore(/var/lib/ceph/osd/ceph-13) _set_cache_sizes cache_size 3221225472 meta 0.45 kv 0.45 data 0.06 Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.293+0000 7efc32db4080 1 bdev(0x564271210c00 /var/lib/ceph/osd/ceph-13/block) open path /var/lib/ceph/osd/ceph-13/block Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.293+0000 7efc32db4080 1 bdev(0x564271210c00 /var/lib/ceph/osd/ceph-13/block) open size 107369988096 (0x18ffc00000, 100 GiB) block_size 4096 (4 KiB) non-rotational discard supported Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.293+0000 7efc32db4080 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-13/block size 100 GiB Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.293+0000 7efc32db4080 1 bluefs mount Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.293+0000 7efc32db4080 1 bluefs _init_alloc shared, id 1, capacity 0x18ffc00000, block size 0x10000 Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.325+0000 7efc32db4080 1 bluefs mount shared_bdev_used = 0 Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.325+0000 7efc32db4080 -1 rocksdb: verify_sharding extra columns in rocksdb. rocksdb columns = [default,m-0,m-1,m-2,p-0,p-1,p-2] target columns = [reshardingXcommencingXlocked] Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.325+0000 7efc32db4080 -1 bluestore(/var/lib/ceph/osd/ceph-13) _open_db erroring opening db: Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.325+0000 7efc32db4080 1 bluefs umount Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.325+0000 7efc32db4080 1 bdev(0x564271210c00 /var/lib/ceph/osd/ceph-13/block) close Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.577+0000 7efc32db4080 1 bdev(0x564271210800 /var/lib/ceph/osd/ceph-13/block) close Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.825+0000 7efc32db4080 -1 osd.13 0 OSD:init: unable to mount object store Jul 08 17:13:46 cephtest24 bash[4161252]: debug 2021-07-08T15:13:46.825+0000 7efc32db4080 -1 ** ERROR: osd init failed: (5) Input/output error How do I correct the issue? Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin

2 years, 9 months

1
0
0 0

NVME hosts added to the clusters and it made old ssd hosts flapping osds

by Szabo, Istvan (Agoda)

Hi, I've added 4 nvme hosts with 2osd/nvme to my cluster and it made al the ssd osds flapping I don't understand why. It is under the same root but 2 different device classes, nvme and ssd. The pools are on the ssd on the nvme nothing at the moment. The only way to bring back the ssd osds alive to shutdown the nvmes. The new nvme servers have 25GB nics the old servers and the mons have 10GB but in aggregated mode. This is the crush rule dump: [ { "rule_id": 0, "rule_name": "replicated_ssd", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -21, "item_name": "default~ssd" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] }, { "rule_id": 1, "rule_name": "replicated_nvme", "ruleset": 1, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -10, "item_name": "default~nvme" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } ] This is the osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -19 561.15057 root default -1 38.03099 host server-2001 0 ssd 2.00000 osd.0 up 1.00000 1.00000 10 ssd 6.98499 osd.10 up 1.00000 1.00000 11 ssd 6.98599 osd.11 up 1.00000 1.00000 12 ssd 2.29799 osd.12 up 1.00000 1.00000 13 ssd 2.29799 osd.13 up 1.00000 1.00000 14 ssd 3.49300 osd.14 up 1.00000 1.00000 41 ssd 6.98499 osd.41 up 1.00000 1.00000 42 ssd 6.98599 osd.42 up 1.00000 1.00000 -3 38.03099 host server-2002 1 ssd 2.00000 osd.1 up 1.00000 1.00000 24 ssd 6.98499 osd.24 up 1.00000 1.00000 25 ssd 6.98599 osd.25 up 1.00000 1.00000 27 ssd 2.29799 osd.27 up 1.00000 1.00000 28 ssd 2.29799 osd.28 up 1.00000 1.00000 29 ssd 3.49300 osd.29 up 1.00000 1.00000 43 ssd 6.98499 osd.43 up 1.00000 1.00000 44 ssd 6.98599 osd.44 up 1.00000 1.00000 -6 38.03000 host server-2003 2 ssd 2.00000 osd.2 up 1.00000 1.00000 26 ssd 6.98499 osd.26 up 1.00000 1.00000 38 ssd 2.29999 osd.38 up 1.00000 1.00000 39 ssd 2.29500 osd.39 up 1.00000 1.00000 40 ssd 3.49300 osd.40 up 1.00000 1.00000 45 ssd 6.98499 osd.45 up 1.00000 1.00000 46 ssd 6.98599 osd.46 up 1.00000 1.00000 47 ssd 6.98599 osd.47 up 1.00000 1.00000 -17 111.76465 host server-2004 5 nvme 6.98529 osd.5 down 0 1.00000 9 nvme 6.98529 osd.9 down 0 1.00000 18 nvme 6.98529 osd.18 down 0 1.00000 22 nvme 6.98529 osd.22 down 0 1.00000 32 nvme 6.98529 osd.32 down 0 1.00000 36 nvme 6.98529 osd.36 down 0 1.00000 50 nvme 6.98529 osd.50 down 0 1.00000 54 nvme 6.98529 osd.54 down 0 1.00000 58 nvme 6.98529 osd.58 down 0 1.00000 62 nvme 6.98529 osd.62 down 0 1.00000 66 nvme 6.98529 osd.66 down 0 1.00000 70 nvme 6.98529 osd.70 down 0 1.00000 74 nvme 6.98529 osd.74 down 0 1.00000 78 nvme 6.98529 osd.78 down 0 1.00000 82 nvme 6.98529 osd.82 down 0 1.00000 86 nvme 6.98529 osd.86 down 0 1.00000 -14 111.76465 host server-2005 4 nvme 6.98529 osd.4 down 0 1.00000 8 nvme 6.98529 osd.8 down 0 1.00000 17 nvme 6.98529 osd.17 down 0 1.00000 21 nvme 6.98529 osd.21 down 0 1.00000 31 nvme 6.98529 osd.31 down 0 1.00000 35 nvme 6.98529 osd.35 down 0 1.00000 49 nvme 6.98529 osd.49 down 0 1.00000 53 nvme 6.98529 osd.53 down 0 1.00000 57 nvme 6.98529 osd.57 down 0 1.00000 61 nvme 6.98529 osd.61 down 0 1.00000 65 nvme 6.98529 osd.65 down 0 1.00000 69 nvme 6.98529 osd.69 down 0 1.00000 73 nvme 6.98529 osd.73 down 0 1.00000 77 nvme 6.98529 osd.77 down 0 1.00000 81 nvme 6.98529 osd.81 down 0 1.00000 85 nvme 6.98529 osd.85 down 0 1.00000 -22 111.76465 host server-2006 6 nvme 6.98529 osd.6 down 0 1.00000 15 nvme 6.98529 osd.15 down 0 1.00000 19 nvme 6.98529 osd.19 down 0 1.00000 23 nvme 6.98529 osd.23 down 0 1.00000 33 nvme 6.98529 osd.33 down 0 1.00000 37 nvme 6.98529 osd.37 down 0 1.00000 51 nvme 6.98529 osd.51 down 0 1.00000 55 nvme 6.98529 osd.55 down 0 1.00000 59 nvme 6.98529 osd.59 down 0 1.00000 63 nvme 6.98529 osd.63 up 0 1.00000 67 nvme 6.98529 osd.67 down 0 1.00000 71 nvme 6.98529 osd.71 up 0 1.00000 75 nvme 6.98529 osd.75 down 0 1.00000 79 nvme 6.98529 osd.79 down 0 1.00000 83 nvme 6.98529 osd.83 down 0 1.00000 87 nvme 6.98529 osd.87 down 0 1.00000 -11 111.76465 host server-2007 3 nvme 6.98529 osd.3 down 0 1.00000 7 nvme 6.98529 osd.7 down 0 1.00000 16 nvme 6.98529 osd.16 down 0 1.00000 20 nvme 6.98529 osd.20 down 0 1.00000 30 nvme 6.98529 osd.30 down 0 1.00000 34 nvme 6.98529 osd.34 down 0 1.00000 48 nvme 6.98529 osd.48 down 0 1.00000 52 nvme 6.98529 osd.52 down 0 1.00000 56 nvme 6.98529 osd.56 down 0 1.00000 60 nvme 6.98529 osd.60 down 0 1.00000 64 nvme 6.98529 osd.64 down 0 1.00000 68 nvme 6.98529 osd.68 down 0 1.00000 72 nvme 6.98529 osd.72 down 0 1.00000 76 nvme 6.98529 osd.76 down 0 1.00000 80 nvme 6.98529 osd.80 down 0 1.00000 84 nvme 6.98529 osd.84 down 0 1.00000 Pool info: pool 21 'dbs-realtime-staging-client' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 27611 lfor 0/27512/27510 flags hashpspool,selfmanaged_snaps max_bytes 9999757606912 stripe_width 0 application rbd pool 24 'dbs-realtime-staging-w-financedb' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode on last_change 27613 flags hashpspool,selfmanaged_snaps max_bytes 19999515213824 stripe_width 0 application rbd pool 25 'dbs-realtime-staging-w-dstest' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 27813 lfor 0/0/23856 flags hashpspool,selfmanaged_snaps max_bytes 99857989632 stripe_width 0 application rbd Istvan Szabo Senior Infrastructure Engineer --------------------------------------------------- Agoda Services Co., Ltd. e: istvan.szabo(a)agoda.com<mailto:istvan.szabo@agoda.com> --------------------------------------------------- ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

2 years, 9 months

2
1
0 0

Re: Fwd: ceph upgrade from luminous to nautils

by M Ranga Swami Reddy

Thanks Marc. Means, we can upgrade from Luminous to Nautils and later can upgrade the OSDs from ceph-disk to ceph-volmue.. On Thu, Jul 8, 2021 at 5:45 PM Marc <Marc(a)f1-outsourcing.eu> wrote: > I did the same upgrade from Luminous to Nautilus, and still have osd's > created with ceph-disk. I am slowly migrating to lvm and encryption. > > However I did have some issues with osd's not starting, you have to check > run levels and make sure the symlinks are still correct. I also had > something that I had to change ownership /dev/sdX before it would start. > > I still have this in my rc.local > > chown ceph.ceph /dev/sdb2 > chown ceph.ceph /dev/sdc2 > chown ceph.ceph /dev/sdd2 > chown ceph.ceph /dev/sde2 > chown ceph.ceph /dev/sdf2 > chown ceph.ceph /dev/sdg2 > chown ceph.ceph /dev/sdh2 > chown ceph.ceph /dev/sdi2 > chown ceph.ceph /dev/sdj2 > chown ceph.ceph /dev/sdk2 > > > > -----Original Message----- > > From: M Ranga Swami Reddy <swamireddy(a)gmail.com> > > Sent: Thursday, 8 July 2021 11:49 > > To: ceph-devel <ceph-devel(a)vger.kernel.org>; ceph-users <ceph- > > users(a)ceph.com> > > Subject: [ceph-users] Fwd: ceph upgrade from luminous to nautils > > > > ---------- Forwarded message --------- > > From: M Ranga Swami Reddy <swamireddy(a)gmail.com> > > Date: Thu, Jul 8, 2021 at 2:30 PM > > Subject: ceph upgrade from luminous to nautils > > To: ceph-devel <ceph-devel(a)vger.kernel.org> > > > > > > Dear All, > > I am using the Ceph with Luminous version with 2000+ OSDs. > > Planning to upgrade the ceph from Luminous to Nautils. > > Currently, all OSDs deployed via ceph-disk. > > Can I proceed with this upgrade? > > is the ceph-disk OSDs will work with ceph-volumes (as ceph-disk > > deprecated > > in memic release) > > > > Please advise. > > > > Thanks > > Swami > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

2 years, 9 months

1
0
0 0

Wrong hostnames in "ceph mgr services" (Octopus)

by Sebastian Knust

Hi, After upgrading from 15.2.8. to 15.2.13 with cephadm on CentOS 8 (containerised installation done by cephadm), Grafana no longer shows new data. Additionally, when accessing the Dashboard-URL on a host currently not hosting the dashboard, I am redirected to a wrong hostname (as shown in ceph mgr services). I assume that this is caused by the same reason which leads to this output of `ceph mgr services`: { "dashboard": "https://ceph-<cluster-id>-mgr.iceph-11.tsmsqs:8443/", "prometheus": "http://ceph-<cluster-id>-mgr.iceph-11.tsmsqs:9283/" } The correct hostname is iceph-11 (without the tsmsqs part), FQDN is iceph-11.servernet. The hosts use DNS, the names (iceph-11 and iceph-11.servernet) are resolvable both from the hosts as well as from within the Podman containers. I have determined that podman by default sets the container name as a hostname alias (visible with `hostname -a` within the container), which somehow leads to Ceph mgr picking it up as the primary name? My workaround is to modify /var/lib/ceph/<cluster-id>/mgr.<hostname>.<random-6-char-string>/unit.run, adding --no-hosts as an additional argument to the "podman run" command. I could probably use a system-wide containers.conf as well. With this workaround and after restarting the Ceph mgr container (via systemctl) and then restarting Prometheus and Grafana (with ceph orch redeploy), I once again get data in Grafana and the correct redirect for the dashboard. `ceph mgr services` also shows expected and correct values. I am wondering if this kind of issue is known or whether there is something wrong with my setup. I expected Ceph mgr to use the primary hostname and not some seemingly random hostname alias. Maybe this issue can also be discussed in a troubleshooting section of the monitoring stack documentation. Cheers Sebastian

2 years, 9 months

1
0
0 0

Stuck MDSs behind in trimming

by Zachary Ulissi

We're running a rook-ceph cluster that has gotten stuck in "1 MDSs behind on trimming". * 1 filesystem, three active MDS servers each with standby * Quite a few files (20M objects), daily snapshots. This might be a problem? * Ceph pacific 16.2.4 * `ceph health detail` doesn't provide much help (see below) * num_segments is very slowly increasing over time * Restarting all of the MDSs returns to the same point. * moderate CPU usage for each MDS server (~30% for the stuck one, ~80% of a core for the others) * logs for the stuck MDS looks clean, it hits rejoin_joint_start then standard 'updating MDS map to version XXX" messages * `ceph daemon mds.x ops` shows no active ops on each of the MDS servers * `mds_log_max_segments` is set to 128, setting to a higher number causes the warning to go away, but the filesystem remains degraded, and setting it back to 128 shows num_segments has not changed. * I've tried playing around with other MDS settings based on various posts on this list and elsewhere, to no avail * `cephfs-journal-tool journal inspect` for each rank says journal integrity is fine. Something similar happened last week and (probably by accident by removing/adding nodes?) I got the MDSs to start recovering and the filesystem went back to healthy. I'm at a bit of a loss for what else to try. Thanks! Zack `ceph health detail` HEALTH_WARN mons are allowing insecure global_id reclaim; 1 filesystem is degraded; 1 MDSs behind on trimming; mon x is low on available space [WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim mon.x has auth_allow_insecure_global_id_reclaim set to true mon.ad has auth_allow_insecure_global_id_reclaim set to true mon.af has auth_allow_insecure_global_id_reclaim set to true [WRN] FS_DEGRADED: 1 filesystem is degraded fs myfs is degraded [WRN] MDS_TRIM: 1 MDSs behind on trimming mds.myfs-d(mds.2): Behind on trimming (340/128) max_segments: 128, num_segments: 340 [WRN] MON_DISK_LOW: mon x is low on available space mon.x has 22% avail `ceph config get mds` WHO MASK LEVEL OPTION VALUE RO global basic log_file * global basic log_to_file false mds basic mds_cache_memory_limit 17179869184 mds advanced mds_cache_trim_decay_rate 1.000000 mds advanced mds_cache_trim_threshold 1048576 mds advanced mds_log_max_segments 128 mds advanced mds_recall_max_caps 5000 mds advanced mds_recall_max_decay_rate 2.500000 global advanced mon_allow_pool_delete true global advanced mon_allow_pool_size_one true global advanced mon_cluster_log_file global advanced mon_pg_warn_min_per_osd 0 global advanced osd_pool_default_pg_autoscale_mode on global advanced osd_scrub_auto_repair true global advanced rbd_default_features 3

2 years, 9 months

1
1
0 0

RocksDB degradation / manual compaction vs. snaptrim operations choking Ceph to a halt

by Christian Rohmann

Hello ceph-users, after an upgrade from Ceph Nautilus to Octopus we ran into extreme performance issues leading to an unusable cluster when doing a larger snapshot delete and the cluster doing snaptrims, see i.e. https://tracker.ceph.com/issues/50511#note-13. Since this was not an issue prior to the upgrade, maybe the conversion of the OSD to OMAP caused this degradation of the RocksDB data structures, maybe not. (We were running bluefs_buffered_io=true, so that was NOT the issue here). But I've noticed there are a few reports of such issues which boil down to RocksDB being in a somewhat degraded state and running a simple compact fixed those issues, see: * https://tracker.ceph.com/issues/50511 * https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/XSEBOIT43TG… * https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/BTWAQIEXBBE… * https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Z4ADQFTGC5H… * Maybe also: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/N74C4U2POOS… I know improvements in this regard are actively worked on for pg removal, i.e. * https://tracker.ceph.com/issues/47174 ** https://github.com/ceph/ceph/pull/37314 ** https://github.com/ceph/ceph/pull/37496 but am wondering if this will help with snaptrims as well? In any case I was just wondering if any of you also experienced this condition with RocksDB and am wondering what you do to monitor or to actively mitigate this prior to having flapping OSDs and queuing up (snaptrim) operations? With Ceph Pacific it's possible to enable offline compaction on every start of an OSD (osd_compact_on_start), but is this really sufficient then? Regards Christian

2 years, 9 months

3
4
0 0

RGW Dedicated clusters vs Shared (RBD, RGW) clusters

by gustavo panizzo

Hello I have some experience with RBD clusters (for use with KVM/libvirt) but now I'm building my first cluster to use with RGW. The RGW cluster size will be around 70T RAW, current RBD cluster(s) are in similar (or smaller) size. I'll be deploying Octopus Since most of the tunning is pretty different (big number of PGs, bluestore_compression_*, bluestore_min_alloc_size_*) I wonder if makes sense to run both workloads in the same cluster or if it would be better to have dedicated clusters. However bigger clusters (AFAIK) are more stable, what is other people doing? Single cluster for all workloads or a cluster per workload? thanks! PS: Asking for the future, what about CepFS? should it share cluster? -- IRC: gfa GPG: 0x27263FA42553615F904A7EBE2A40A2ECB8DAD8D5 OLD GPG: 0x44BB1BA79F6C6333

2 years, 9 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users July 2021