January 2024 - ceph-users

List contents of stray buckets with octopus

by Frank Schilder

Hi all, I need to list the contents of the stray buckets on one of our MDSes. The MDS reports 772674 stray entries. However, if I dump its cache and grep for stray I get only 216 hits. How can I get to the contents of the stray buckets? Please note that Octopus is still hit by https://tracker.ceph.com/issues/57059 so a "dump tree" will not work. In addition, I clearly don't just need the entries in cache, I need a listing of everything. How can I get that? I'm willing to run rados commands and pipe through ceph-encoder if necessary. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

3 months, 3 weeks

1
0
0 0

How many pool for cephfs

by Albert Shih

Hi everyone, I like to know how many pool should I create for multiple cephfs ? Knowing I got two class of osd (hdd and ssd), and I have a need of ~ 20/30 cephfs (currently and that number will increase with time). Should I create one cephfs_metadata_replicated one cephfs_data_replicated few cephfs_data_erasure_coding (depending of k/m) and put all my cephfs inside two of them. Or should I create for each cephfs a couple of pool metadata/data ? Il will also need to have ceph S3 storage, same question, should I have a designated pool for S3 storage or can/should I use the same cephfs_data_replicated/erasure pool ? Regards -- Albert SHIH 🦫 🐸 France Heure locale/Local time: mer. 24 janv. 2024 09:33:09 CET

3 months, 3 weeks

3
6
0 0

Unable to locate "bluestore_compressed_allocated" & "bluestore_compressed_original" parameters while executing "ceph daemon osd.X perf dump" command.

by Alam Mohammad

Hi, We are considering BlueStore compression test in our cluster. For this we have created rbd image on our EC pool. While we are executing "ceph daemon osd.X perf dump | grep -E '(compress_.*_count|bluestore_compressed_)'", we are not locate below parameters, even we tried with ceph tell command. "bluestore_compressed_allocated" "bluestore_compressed_original" As a result, we are unable to determine the extent of compression for specific RBD images. Are there any specific configurations required to locate the given parameters or any alternate method to assess Bluestore compression. Any guidance or insight would be greatly appreciated. Thanks Mohammad Saif Ceph Enthusiast

3 months, 3 weeks

2
3
0 0

RFI: Prometheus, Etc, Services - Optimum Number To Run

by duluxoz

Hi All, In regards to the monitoring services on a Ceph Cluster (ie Prometheus, Grafana, Alertmanager, Loki, Node-Exported, Promtail, etc) how many instances should/can we run for fault tolerance purposes? I can't seem to recall that advice being in the doco anywhere (but of course, I probably missed it). I'm concerned about HA on those services - will they continue to run if the Ceph Node they're on fails? At the moment we're running only 1 instance of each in the cluster, but several Ceph Nodes are capable of running each - ie/eg 3 nodes configured but only count:1. This is on the latest version of Reef using cephadmin (if it makes a huge difference :-) ). So any advice, etc, would be greatly appreciated, including if we should be running any services not mentioned (not Mgr, Mon, OSD, or iSCSI, obviously :-) ) Cheers Dulux-Oz

3 months, 4 weeks

2
1
0 0

rbd map snapshot, mount lv, node crash

by Marc

Am I doing something weird when I do on a ceph node (nautilus, el7): rbd snap ls vps-test -p rbd rbd map vps-test(a)vps-test.snap1 -p rbd mount -o ro /dev/mapper/VGnew-LVnew /mnt/disk <--- reset/reboot ceph node

4 months

2
1
0 0

How does mclock work?

by Frédéric Nass

Hello, Could someone please explain how mclock works regarding reads and writes? Does mclock intervene on both read and write iops? Or only on reads or only on writes? And what type of underlying hardware performance is calculated and considered by mclock? Seems to be only write performance. The mclock documentation shows HDDs and SSDs specific configuration options (capacity and sequential bandwidth) but nothing regarding hybrid setups and these configuration options do not distinguish reads and writes. But read and write performance are often not in par for a single drive and even less when using hybrid setups. With hybrid setups (RocksDB+WAL on SSDs or NVMes and Data on HDD), if mclock only considers write performance, it may fail to properly schedule read iops (does mclock schedule read iops?) as the calculated iops capacity would be way too high for reads. With HDD only setups (RocksDB+WAL+Data on HDD), if mclock only considers write performance, the OSD may not take advantage of higher read performance. Can someone please shed some light on this? Best regards, Frédéric Nass Sous-direction Infrastructures et Services Direction du Numérique Université de Lorraine Tél : +33 3 72 74 11 35

4 months

11
14
0 0

Keyring location for ceph-crash?

by Jan Kasprzak

Hello, Ceph users, what is the correct location of keyring for ceph-crash? I tried to follow this document: https://docs.ceph.com/en/latest/mgr/crash/ # ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' > /etc/ceph/ceph.client.crash.keyring and copy this file to all nodes. When I run ceph-crash.service on the node where client.admin.keyring is available, it works. But on the rest of nodes, it tries to access the admin keyring anyway. Journalctl -u ceph-crash says this: Jan 18 18:03:35 my.node.name systemd[1]: Started Ceph crash dump collector. Jan 18 18:03:35 my.node.name ceph-crash[2973164]: INFO:ceph-crash:pinging cluster to exercise our key Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.786+0100 7eff34016640 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.786+0100 7eff34016640 -1 AuthRegistry(0x7eff2c063ce8) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.787+0100 7eff34016640 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.787+0100 7eff34016640 -1 AuthRegistry(0x7eff2c067de0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.788+0100 7eff34016640 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.788+0100 7eff34016640 -1 AuthRegistry(0x7eff340150c0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx Jan 18 18:03:35 my.node.name ceph-crash[2973166]: [errno 2] RADOS object not found (error connecting to the cluster) Jan 18 18:03:35 my.node.name ceph-crash[2973164]: INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s Is the documentation outdated, or am I doing anything wrong? Thanks for any hint. This is non-containerized Ceph 18.2.1 on AlmaLinux 9. -Yenya -- | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> | | https://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise. --Larry Wall

4 months

2
2
0 0

Indexless bucket constraints of ceph-rgw

by Jaemin Joo

I am considering Indexless bucket to increase the small file performance and maximum number of files using just HDDs. When I checked the constraints of indexless bucket in ceph-docs, it indicates the possible for bucket list problem, so I just know it makes the problem of versioning and sync. I don't understand what means of it. and I just guessed it's similar with db index feature. but it's not. Could you tell me the indexless bucket constraints?

4 months

1
0
0 0

Join us for the User + Dev Monthly Meetup - January 18th!

by Laura Flores

Hi Ceph users and developers, You are invited to join us at the User + Dev meeting this week Thursday, January 18th at 10:00 AM Eastern Time! See below for more meeting details. The focus topic, "Ceph Feature Request from the DKIST Data Center: Add a service backed by tape that is analogous to AWS Glacier", will be presented by Joel Davidow, a Ceph operator from the National Solar Observatory. In his talk, he will propose a feature request to add support for tape as a storage class with lifecycle management for object storage. Feel free to add questions or additional topics under the "Open Discussion" section on the agenda: https://pad.ceph.com/p/ceph-user-dev-monthly-minutes If you have an idea for a focus topic you'd like to present at a future meeting, you are welcome to submit it to this Google Form: https://docs.google.com/forms/d/e/1FAIpQLSdboBhxVoBZoaHm8xSmeBoemuXoV_rmh4v… Any Ceph user or developer is eligible to submit! Thanks, Laura Flores Meeting link: https://meet.jit.si/ceph-user-dev-monthly Time Conversions: UTC: Thursday, January 18, 15:00 UTC Mountain View, CA, US: Thursday, January 18, 7:00 PST Phoenix, AZ, US: Thursday, January 18, 8:00 MST Denver, CO, US: Thursday, January 18, 8:00 MST Huntsville, AL, US: Thursday, January 18, 9:00 CST Raleigh, NC, US: Thursday, January 18, 10:00 EST London, England: Thursday, January 18, 15:00 GMT Paris, France: Thursday, January 18, 16:00 CET Helsinki, Finland: Thursday, January 18, 17:00 EET Tel Aviv, Israel: Thursday, January 18, 17:00 IST Pune, India: Thursday, January 18, 20:30 IST Brisbane, Australia: Friday, January 19, 1:00 AEST Singapore, Asia: Thursday, January 18, 23:00 +08 Auckland, New Zealand: Friday, January 19, 4:00 NZDT -- Laura Flores She/Her/Hers Software Engineer, Ceph Storage <https://ceph.io> Chicago, IL lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com> M: +17087388804

4 months

1
2
0 0

Wide EC pool causes very slow backfill?

by Torkil Svensgaard

Hi Our 17.2.7 cluster: " -33 886.00842 datacenter 714 -7 209.93135 host ceph-hdd1 -69 69.86389 host ceph-flash1 -6 188.09579 host ceph-hdd2 -3 233.57649 host ceph-hdd3 -12 184.54091 host ceph-hdd4 -34 824.47168 datacenter DCN -73 69.86389 host ceph-flash2 -5 252.27127 host ceph-hdd14 -2 201.78067 host ceph-hdd5 -81 288.26501 host ceph-hdd6 -31 264.56207 host ceph-hdd7 -36 1284.48621 datacenter TBA -77 69.86389 host ceph-flash3 -21 190.83224 host ceph-hdd8 -29 199.08838 host ceph-hdd9 -11 193.85382 host ceph-hdd10 -9 237.28154 host ceph-hdd11 -26 187.19536 host ceph-hdd12 -4 206.37102 host ceph-hdd13 " We recently created an EC 4+5 pool with failure domain datacenter. The DCN datacenter only had 2 hdd hosts so we added one more to make it possible at all, since each DC needs 3 shards, as I understand it. Backfill was really slow though, so we just added another host to the DCN datacenter. Backfill looks like this: " data: volumes: 1/1 healthy pools: 13 pools, 11153 pgs objects: 311.53M objects, 1000 TiB usage: 1.6 PiB used, 1.6 PiB / 3.2 PiB avail pgs: 60/1669775060 objects degraded (0.000%) 373356926/1669775060 objects misplaced (22.360%) 5944 active+clean 5177 active+remapped+backfill_wait 22 active+remapped+backfilling 4 active+recovery_wait+degraded+remapped 3 active+recovery_wait+remapped 2 active+recovery_wait+degraded 1 active+recovering+degraded+remapped io: client: 73 MiB/s rd, 339 MiB/s wr, 1.06k op/s rd, 561 op/s wr recovery: 1.2 GiB/s, 313 objects/s " Given that the first host added had 19 OSDs, with none of them anywhere near the target capacity, and the one we just added has 22 empty OSDs, having just 22 PGs backfilling and 1 recovering seems somewhat underwhelming. Is this to be expected with such a pool? Mclock profile is high_recovery_ops. Mvh. Torkil -- Torkil Svensgaard Sysadmin MR-Forskningssektionen, afs. 714 DRCMR, Danish Research Centre for Magnetic Resonance Hvidovre Hospital Kettegård Allé 30 DK-2650 Hvidovre Denmark Tel: +45 386 22828 E-mail: torkil(a)drcmr.dk

4 months

3
6
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2024