March 2023 - ceph-users - lists.ceph.io

RGW Multisite archive zone bucket removal restriction

by Ondřej Kukla

Hello, I’m planning a Ceph multisite deployment with two clusters. One being a primary one the clients will be intereacting with and a second one with zone tier type set as an archive one. There is also a one way sync from the primary zone set - I would like to have this cluster as a "backup zone”. The setup works as expected for me, but with one exception. When I delete a bucket on the primary zone the bucket in the archive zone gets also deleted which is a issue as I would like to be able to restore the bucket in case of unintentional deletion on the primary zone. I know that there is the object lock, but I would like to know if there is some other way? I would imagine for example a situation where when the bucket is removed on the primary zone the bucket on the archive zone would be marked as to be removed in some time period like 1day or 1week or something. Your thoughts are much appreciated. Kind regards, Ondrej

1 year, 1 month

1
0
0 0

Minimum client version for Quincy

by Massimo Sgaravatto

Dear all I am going to update a ceph cluster (where I am using only rbd and rgw, i.e. I didn't deploy cephfs) from Octtopus to Quincy Before doing that I would like to understand if some old nautilus clients (that I can't update for several reasons) will still be able to connect In general: I am not able to find this information in the documentation of any ceph release Should I refer to get-require-min-compat-client ? Now in my Octopus cluster I see: [root@ceph-mon-01 ~]# ceph osd get-require-min-compat-client luminous but I have the feeling that this value is simply the one I set a while ago to support the upmap feature Thanks, Massimo

1 year, 1 month

2
2
0 0

3 node clusters and a corner case behavior

by Victor Rodriguez

Hello, Before we start I'm fully aware that this kind of setup is not recommended by any means and I'm familiar with it's implications. I'm just trying to practice extreme situations, just in case... I have a test cluster with: 3 nodes with Proxmox 7.3 + Ceph Quincy 17.2.5 3 monitors + 3 managers in server01, server02 and server03 4 OSD, two in server01, two in server02. No OSD in server03. All OSD are class "ssd". 1 pool with replica=2, min_replica=1. Crush rule uses just ssd class OSD. I do wait for ceph status to be fully OK between each test. A.- If I orderly shutdown server01, it's OSDs get marked down as expected. I/O on the pool works correctly before, during and after the shutdown. B.- If I poweroff server01, it's OSDs do not get marked down. I/O on the pool does not work at all, neither reads nor writes. A small number of slow-ops show in ceph status, something like 7 to 25. After 30 minutes, the server01's OSDs get marked down, I/O on the pool gets restored and slow-ops disappear. C.- Now I create an OSD on server03 with class "noClass". This OSD won't be used by the pool. If I now poweroff server01, it's OSDs get marked down as soon as some I/O is sent to the pool and I/O works correctly. Looks like I am in this exact situation: https://tracker.ceph.com/issues/16910#note-2 Questions: Why does Ceph behave this way in test B? Shouldn't it simply mark the OSDs down like in test A and C? Which config setting(s) set that 30 minute wait time before marking all OSD down? Many thanks in advance! --

1 year, 1 month

1
0
0 0

unable to calc client keyring client.admin placement PlacementSpec(label='_admin'): Cannot place : No matching hosts for label _admin

by Jeremy Hansen

3/3/23 2:13:53 AM[WRN]unable to calc client keyring client.admin placement PlacementSpec(label='_admin'): Cannot place : No matching hosts for label _admin I keep seeing this warning in the logs. I’m not really sure what action to take to resolve this issue. Thanks -jeremy

1 year, 1 month

3
2
0 0

Re: CephFS Kernel Mount Options Without Mount Helper

by Milind Changire

(for archival purposes) On Thu, Mar 2, 2023 at 6:04 PM Milind Changire <mchangir(a)redhat.com> wrote: > The docs for the ceph kernel module will be updated appropriately in the > kernel documentation. > Thanks for pointing out your pain point. > > -- > Milind > > > On Thu, Mar 2, 2023 at 1:41 PM Shawn Weeks <sweeks(a)weeksconsulting.us> > wrote: > >> I’m already able to mount ceph without a helper using the built in ceph >> kernel module support. My issue is that the documentation mixes what >> parameters the module itself supports and what requires the helper. >> Everything I’ve discovered so far is from reading the source code and >> piecing things together from stack overflow and ceph forum posts. I was >> hoping there was a better answer and I was just missing a different set of >> documentation. >> >> Sent from my iPhone >> >> On Mar 2, 2023, at 1:48 AM, Milind Changire <mchangir(a)redhat.com> wrote: >> >> >> I think the mount(8) man page section titled "EXTERNAL HELPERS" states it >> clearly: >> ----- >> EXTERNAL HELPERS >> The syntax of external mount helpers is: >> >> /sbin/mount.suffix spec dir [-sfnv] [-N namespace] [-o options] >> [-t type.subtype] >> >> where the suffix is the filesystem type and the -sfnvoN options >> have the same meaning >> as the normal mount options. The -t option is used for filesystems >> with subtypes support (for >> example /sbin/mount.fuse -t fuse.sshfs). >> >> The command mount does not pass the mount options unbindable, >> runbindable, private, rprivate, >> slave, rslave, shared, rshared, auto, noauto, comment, x-*, loop, >> offset and sizelimit to >> the mount.<suffix> helpers. All other options are used in a >> comma-separated list as an argument >> to the -o option. >> ----- >> >> So, if there are mount options other than the basic ones processed by the >> main mount program, they are passed to the mount helper presuming they are >> of interest to the mount helper. Since the main mount program will be >> unable to make sense of ceph mount options like "secret", "secretfile", >> "mon_addr", "conf", "name", "ms_mode", "fs", "nofallback", etc., a mount >> helper will be required to mount a ceph file system. >> >> ----- >> 1. What other filesystem types does your Linux Distro mount currently ? >> 2. What other filesystems of your interest can your Linux Distro (Rocky >> 8/9) mount without a mount helper ? >> >> >> >> On Thu, Mar 2, 2023 at 11:53 AM Shawn Weeks <sweeks(a)weeksconsulting.us> >> wrote: >> >>> Rock 8 and 9 don’t have the helper available in their repos and I have >>> to work with what they include. >>> >>> Thanks >>> Shawn >>> >>> Sent from my iPhone >>> >>> On Mar 1, 2023, at 11:41 PM, Milind Changire <mchangir(a)redhat.com> >>> wrote: >>> >>> >>> Why is it critical to mount the ceph filesystem without the ceph mount >>> helper ? >>> >>> >>> On Thu, Mar 2, 2023 at 8:42 AM Shawn Weeks <sweeks(a)weeksconsulting.us> >>> wrote: >>> >>>> That’s the documentation that assumes you’re going to have the helper. >>>> It lists things like “secretfile” and “fs” that doesn’t work without the >>>> helper. I’ve gone back several versions of that page and none of them spell >>>> out what requires the helper and what’s supported native on the kernel. >>>> >>>> Thanks >>>> Shawn >>>> >>>> Sent from my iPhone >>>> >>>> On Mar 1, 2023, at 8:53 PM, Milind Changire <mchangir(a)redhat.com> >>>> wrote: >>>> >>>> >>>> Check if this doc helps: >>>> https://docs.ceph.com/en/quincy/cephfs/mount-using-kernel-driver/ >>>> >>>> >>>> On Tue, Feb 28, 2023 at 11:09 PM Shawn Weeks <sweeks(a)weeksconsulting.us> >>>> wrote: >>>> >>>>> Even the documentation at >>>>> https://www.kernel.org/doc/html/v5.14/filesystems/ceph.html#mount-options >>>>> is incomplete and doesn’t list options like “secret” and “mds_namespace” >>>>> >>>>> Thanks >>>>> Shawn >>>>> >>>>> > On Feb 28, 2023, at 11:03 AM, Shawn Weeks <sweeks(a)weeksconsulting.us> >>>>> wrote: >>>>> > >>>>> > I’m trying to find documentation for which mount options are >>>>> supported directly by the kernel module. For example in the kernel module >>>>> included in Rocky Linux 8 and 9 the secretfile option isn’t supported even >>>>> though the documentation seems to imply it is. It seems like the >>>>> documentation assumes you’ll always be using the mount.ceph helper and I’m >>>>> trying to find out what options are supported if you don’t have mount.ceph >>>>> helper. >>>>> > >>>>> > Thanks >>>>> > Shawn >>>>> > _______________________________________________ >>>>> > ceph-users mailing list -- ceph-users(a)ceph.io >>>>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list -- ceph-users(a)ceph.io >>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >>>>> >>>> >>>> >>>> -- >>>> Milind >>>> >>>> >>> >>> -- >>> Milind >>> >>> >> >> -- >> Milind >> >> > > -- > Milind > > -- Milind

1 year, 1 month

1
0
0 0

Interruption of rebalancing

by Jeffrey Turmelle

I have a Nautilus cluster with 7 nodes, 210 HDDs. I recently added the 7th node with 30 OSDs which are currently rebalancing very slowly. I just noticed that the ethernet interface only negotiated a 1Gb connection, even though it has a 10Gb interface. I’m not sure why, but would like to reboot the node to get the interface back to 10Gb. Is it ok to do this? What should I do to prep the cluster for the reboot? Jeffrey Turmelle International Research Institute for Climate & Society <https://iri.columbia.edu/> The Climate School <https://climate.columbia.edu/> at Columbia University <https://columbia.edu/> 845-652-3461

1 year, 1 month

4
4
0 0

Very slow backfilling

by Joffrey

Hi, I have many 'not {deep-}scrubbed in time' and a1 PG remapped+backfilling and I don't understand why this backfilling is taking so long. root@hbgt-ceph1-mon3:/# ceph -s cluster: id: c300532c-51fa-11ec-9a41-0050569c3b55 health: HEALTH_WARN 15 pgs not deep-scrubbed in time 13 pgs not scrubbed in time services: mon: 3 daemons, quorum hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3 (age 36h) mgr: hbgt-ceph1-mon2.nteihj(active, since 2d), standbys: hbgt-ceph1-mon1.thrnnu, hbgt-ceph1-mon3.gmfzqm osd: 60 osds: 60 up (since 13h), 60 in (since 13h); 1 remapped pgs rgw: 3 daemons active (3 hosts, 2 zones) data: pools: 13 pools, 289 pgs objects: 67.74M objects, 127 TiB usage: 272 TiB used, 769 TiB / 1.0 PiB avail pgs: 288 active+clean 1 active+remapped+backfilling io: client: 3.3 KiB/s rd, 1.5 MiB/s wr, 3 op/s rd, 8 op/s wr recovery: 790 KiB/s, 0 objects/s What can I do to understand this slow recovery (is it the backfill action ?) Thanks you 'Jof

1 year, 1 month

1
0
0 0

ceph 16.2.10 - misplaced object after changing crush map only setting hdd class

by xadhoom76＠gmail.com

Hi to all and thanks for sharing your experience on ceph ! We have an easy setup with 9 osd all hdd and 3 nodes, 3 osd for each node. We started the cluster to test how it works with hdd with default and easy bootstrap . Then we decide to add ssd and create a pool to use only ssd. In order to have pools on hdd and pools on ssd only we edited the crushmap to add class hdd We do not enter anything about ssd till now, nor disk or rules only add the class map to the default rule. So i show you the rules before introducing class hdd # rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule erasure-code { id 1 type erasure min_size 3 max_size 4 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default step chooseleaf indep 0 type host step emit } rule erasure2_1 { id 2 type erasure min_size 3 max_size 3 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default step chooseleaf indep 0 type host step emit } rule erasure-pool.meta { id 3 type erasure min_size 3 max_size 3 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default step chooseleaf indep 0 type host step emit } rule erasure-pool.data { id 4 type erasure min_size 3 max_size 3 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default step chooseleaf indep 0 type host step emit } And here is the after # rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default class hdd step chooseleaf firstn 0 type host step emit } rule erasure-code { id 1 type erasure min_size 3 max_size 4 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step chooseleaf indep 0 type host step emit } rule erasure2_1 { id 2 type erasure min_size 3 max_size 3 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step chooseleaf indep 0 type host step emit } rule erasure-pool.meta { id 3 type erasure min_size 3 max_size 3 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step chooseleaf indep 0 type host step emit } rule erasure-pool.data { id 4 type erasure min_size 3 max_size 3 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step chooseleaf indep 0 type host step emit } Just doing this triggered the misplaced of all pgs bind to EC pool. Is that correct ? and why ? Best regards Alessandro Bolgia

1 year, 1 month

2
2
0 0

How do I troubleshoot radosgw errors STS?

by mat＠hazmat.dev

I've setup RadosGW with STS ontop of my ceph cluster. It works great and fine but I'm also trying to setup authentication with an OpenIDConnect provider. I'm have a hard time troubleshooting issues because the radosgw log file doesn't have much information in it. For example when I try to use the `sts:AssumeRoleWithWebIdentity` API it fails with `{'Code': 'AccessDenied', ...}` and all I see is the beat log showing an HTTP 403. Is there a way to enable more verbose logging so I can see what is failing and why I'm getting certain errors with STS, S3, or IAM apis? My ceph.conf looks like this for each node (mildly redacted): ``` [client.radosgw.pve4] host = pve4 keyring = /etc/pve/priv/ceph.client.radosgw.keyring log file = /var/log/ceph/client.radosgw.$host.log rgw_dns_name = s3.lab rgw_frontends = beast endpoint=0.0.0.0:7480 ssl_endpoint=0.0.0.0:443 ssl_certificate=/etc/pve/priv/ceph/s3.lab.crt ssl_private_key=/etc/pve/priv/ceph/s3.lab.key rgw_sts_key = 1111111111111111 rgw_s3_auth_use_sts = true rgw_enable_apis = s3, s3website, admin, sts, iam ```

1 year, 1 month

2
3
0 0

PG Sizing Question

by Deep Dish

Hello Looking to get some official guidance on PG and PGP sizing. Is the goal to maintain approximately 100 PGs per OSD per pool or for the cluster general? Assume the following scenario: Cluster with 80 OSD across 8 nodes; 3 Pools: - Pool1 = Replicated 3x - Pool2 = Replicated 3x - Pool3 = Erasure Coded 6-4 Assuming the well published formula: Let (Target PGs / OSD) = 100 [ (Target PGs / OSD) * (# of OSDs) ] / (Replica Size) - Pool1 = (100*80)/3 = 2666.67 => 4096 - Pool2 = (100*80)/3 = 2666.67 => 4096 - Pool3 = (100*80)/10 = 800 => 1024 Total cluster would have 9216 PGs and PGPs. Are there any implications (performance / monitor / MDS / RGW sizing) with how many PGs are created on the cluster? Looking for validation and / or clarification of the above. Thank you.

1 year, 1 month

2
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2023