Hello,
I’m planning a Ceph multisite deployment with two clusters. One being a primary one the clients will be intereacting with and a second one with zone tier type set as an archive one. There is also a one way sync from the primary zone set - I would like to have this cluster as a "backup zone”.
The setup works as expected for me, but with one exception. When I delete a bucket on the primary zone the bucket in the archive zone gets also deleted which is a issue as I would like to be able to restore the bucket in case of unintentional deletion on the primary zone.
I know that there is the object lock, but I would like to know if there is some other way? I would imagine for example a situation where when the bucket is removed on the primary zone the bucket on the archive zone would be marked as to be removed in some time period like 1day or 1week or something.
Your thoughts are much appreciated.
Kind regards,
Ondrej
Dear all
I am going to update a ceph cluster (where I am using only rbd and rgw,
i.e. I didn't deploy cephfs) from Octtopus to Quincy
Before doing that I would like to understand if some old nautilus clients
(that I can't update for several reasons) will still be able to connect
In general: I am not able to find this information in the documentation of
any ceph release
Should I refer to get-require-min-compat-client ?
Now in my Octopus cluster I see:
[root@ceph-mon-01 ~]# ceph osd get-require-min-compat-client
luminous
but I have the feeling that this value is simply the one I set a while ago
to support the upmap feature
Thanks, Massimo
Hello,
Before we start I'm fully aware that this kind of setup is not
recommended by any means and I'm familiar with it's implications. I'm
just trying to practice extreme situations, just in case...
I have a test cluster with:
3 nodes with Proxmox 7.3 + Ceph Quincy 17.2.5
3 monitors + 3 managers in server01, server02 and server03
4 OSD, two in server01, two in server02. No OSD in server03. All OSD are
class "ssd".
1 pool with replica=2, min_replica=1. Crush rule uses just ssd class OSD.
I do wait for ceph status to be fully OK between each test.
A.- If I orderly shutdown server01, it's OSDs get marked down as
expected. I/O on the pool works correctly before, during and after the
shutdown.
B.- If I poweroff server01, it's OSDs do not get marked down. I/O on the
pool does not work at all, neither reads nor writes. A small number of
slow-ops show in ceph status, something like 7 to 25. After 30 minutes,
the server01's OSDs get marked down, I/O on the pool gets restored and
slow-ops disappear.
C.- Now I create an OSD on server03 with class "noClass". This OSD won't
be used by the pool. If I now poweroff server01, it's OSDs get marked
down as soon as some I/O is sent to the pool and I/O works correctly.
Looks like I am in this exact situation:
https://tracker.ceph.com/issues/16910#note-2
Questions:
Why does Ceph behave this way in test B? Shouldn't it simply mark the
OSDs down like in test A and C?
Which config setting(s) set that 30 minute wait time before marking all
OSD down?
Many thanks in advance!
--
3/3/23 2:13:53 AM[WRN]unable to calc client keyring client.admin placement PlacementSpec(label='_admin'): Cannot place : No matching hosts for label _admin
I keep seeing this warning in the logs. I’m not really sure what action to take to resolve this issue.
Thanks
-jeremy
(for archival purposes)
On Thu, Mar 2, 2023 at 6:04 PM Milind Changire <mchangir(a)redhat.com> wrote:
> The docs for the ceph kernel module will be updated appropriately in the
> kernel documentation.
> Thanks for pointing out your pain point.
>
> --
> Milind
>
>
> On Thu, Mar 2, 2023 at 1:41 PM Shawn Weeks <sweeks(a)weeksconsulting.us>
> wrote:
>
>> I’m already able to mount ceph without a helper using the built in ceph
>> kernel module support. My issue is that the documentation mixes what
>> parameters the module itself supports and what requires the helper.
>> Everything I’ve discovered so far is from reading the source code and
>> piecing things together from stack overflow and ceph forum posts. I was
>> hoping there was a better answer and I was just missing a different set of
>> documentation.
>>
>> Sent from my iPhone
>>
>> On Mar 2, 2023, at 1:48 AM, Milind Changire <mchangir(a)redhat.com> wrote:
>>
>>
>> I think the mount(8) man page section titled "EXTERNAL HELPERS" states it
>> clearly:
>> -----
>> EXTERNAL HELPERS
>> The syntax of external mount helpers is:
>>
>> /sbin/mount.suffix spec dir [-sfnv] [-N namespace] [-o options]
>> [-t type.subtype]
>>
>> where the suffix is the filesystem type and the -sfnvoN options
>> have the same meaning
>> as the normal mount options. The -t option is used for filesystems
>> with subtypes support (for
>> example /sbin/mount.fuse -t fuse.sshfs).
>>
>> The command mount does not pass the mount options unbindable,
>> runbindable, private, rprivate,
>> slave, rslave, shared, rshared, auto, noauto, comment, x-*, loop,
>> offset and sizelimit to
>> the mount.<suffix> helpers. All other options are used in a
>> comma-separated list as an argument
>> to the -o option.
>> -----
>>
>> So, if there are mount options other than the basic ones processed by the
>> main mount program, they are passed to the mount helper presuming they are
>> of interest to the mount helper. Since the main mount program will be
>> unable to make sense of ceph mount options like "secret", "secretfile",
>> "mon_addr", "conf", "name", "ms_mode", "fs", "nofallback", etc., a mount
>> helper will be required to mount a ceph file system.
>>
>> -----
>> 1. What other filesystem types does your Linux Distro mount currently ?
>> 2. What other filesystems of your interest can your Linux Distro (Rocky
>> 8/9) mount without a mount helper ?
>>
>>
>>
>> On Thu, Mar 2, 2023 at 11:53 AM Shawn Weeks <sweeks(a)weeksconsulting.us>
>> wrote:
>>
>>> Rock 8 and 9 don’t have the helper available in their repos and I have
>>> to work with what they include.
>>>
>>> Thanks
>>> Shawn
>>>
>>> Sent from my iPhone
>>>
>>> On Mar 1, 2023, at 11:41 PM, Milind Changire <mchangir(a)redhat.com>
>>> wrote:
>>>
>>>
>>> Why is it critical to mount the ceph filesystem without the ceph mount
>>> helper ?
>>>
>>>
>>> On Thu, Mar 2, 2023 at 8:42 AM Shawn Weeks <sweeks(a)weeksconsulting.us>
>>> wrote:
>>>
>>>> That’s the documentation that assumes you’re going to have the helper.
>>>> It lists things like “secretfile” and “fs” that doesn’t work without the
>>>> helper. I’ve gone back several versions of that page and none of them spell
>>>> out what requires the helper and what’s supported native on the kernel.
>>>>
>>>> Thanks
>>>> Shawn
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Mar 1, 2023, at 8:53 PM, Milind Changire <mchangir(a)redhat.com>
>>>> wrote:
>>>>
>>>>
>>>> Check if this doc helps:
>>>> https://docs.ceph.com/en/quincy/cephfs/mount-using-kernel-driver/
>>>>
>>>>
>>>> On Tue, Feb 28, 2023 at 11:09 PM Shawn Weeks <sweeks(a)weeksconsulting.us>
>>>> wrote:
>>>>
>>>>> Even the documentation at
>>>>> https://www.kernel.org/doc/html/v5.14/filesystems/ceph.html#mount-options
>>>>> is incomplete and doesn’t list options like “secret” and “mds_namespace”
>>>>>
>>>>> Thanks
>>>>> Shawn
>>>>>
>>>>> > On Feb 28, 2023, at 11:03 AM, Shawn Weeks <sweeks(a)weeksconsulting.us>
>>>>> wrote:
>>>>> >
>>>>> > I’m trying to find documentation for which mount options are
>>>>> supported directly by the kernel module. For example in the kernel module
>>>>> included in Rocky Linux 8 and 9 the secretfile option isn’t supported even
>>>>> though the documentation seems to imply it is. It seems like the
>>>>> documentation assumes you’ll always be using the mount.ceph helper and I’m
>>>>> trying to find out what options are supported if you don’t have mount.ceph
>>>>> helper.
>>>>> >
>>>>> > Thanks
>>>>> > Shawn
>>>>> > _______________________________________________
>>>>> > ceph-users mailing list -- ceph-users(a)ceph.io
>>>>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>>
>>>>
>>>>
>>>> --
>>>> Milind
>>>>
>>>>
>>>
>>> --
>>> Milind
>>>
>>>
>>
>> --
>> Milind
>>
>>
>
> --
> Milind
>
>
--
Milind
I have a Nautilus cluster with 7 nodes, 210 HDDs. I recently added the 7th node with 30 OSDs which are currently rebalancing very slowly. I just noticed that the ethernet interface only negotiated a 1Gb connection, even though it has a 10Gb interface. I’m not sure why, but would like to reboot the node to get the interface back to 10Gb.
Is it ok to do this? What should I do to prep the cluster for the reboot?
Jeffrey Turmelle
International Research Institute for Climate & Society <https://iri.columbia.edu/>
The Climate School <https://climate.columbia.edu/> at Columbia University <https://columbia.edu/>
845-652-3461
Hi,
I have many 'not {deep-}scrubbed in time' and a1 PG remapped+backfilling
and I don't understand why this backfilling is taking so long.
root@hbgt-ceph1-mon3:/# ceph -s
cluster:
id: c300532c-51fa-11ec-9a41-0050569c3b55
health: HEALTH_WARN
15 pgs not deep-scrubbed in time
13 pgs not scrubbed in time
services:
mon: 3 daemons, quorum hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3
(age 36h)
mgr: hbgt-ceph1-mon2.nteihj(active, since 2d), standbys:
hbgt-ceph1-mon1.thrnnu, hbgt-ceph1-mon3.gmfzqm
osd: 60 osds: 60 up (since 13h), 60 in (since 13h); 1 remapped pgs
rgw: 3 daemons active (3 hosts, 2 zones)
data:
pools: 13 pools, 289 pgs
objects: 67.74M objects, 127 TiB
usage: 272 TiB used, 769 TiB / 1.0 PiB avail
pgs: 288 active+clean
1 active+remapped+backfilling
io:
client: 3.3 KiB/s rd, 1.5 MiB/s wr, 3 op/s rd, 8 op/s wr
recovery: 790 KiB/s, 0 objects/s
What can I do to understand this slow recovery (is it the backfill action ?)
Thanks you
'Jof
Hi to all and thanks for sharing your experience on ceph !
We have an easy setup with 9 osd all hdd and 3 nodes, 3 osd for each node.
We started the cluster to test how it works with hdd with default and easy bootstrap . Then we decide to add ssd and create a pool to use only ssd.
In order to have pools on hdd and pools on ssd only we edited the crushmap to add class hdd
We do not enter anything about ssd till now, nor disk or rules only add the class map to the default rule.
So i show you the rules before introducing class hdd
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule erasure-code {
id 1
type erasure
min_size 3
max_size 4
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
rule erasure2_1 {
id 2
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.meta {
id 3
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.data {
id 4
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
And here is the after
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
rule erasure-code {
id 1
type erasure
min_size 3
max_size 4
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule erasure2_1 {
id 2
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.meta {
id 3
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.data {
id 4
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
Just doing this triggered the misplaced of all pgs bind to EC pool.
Is that correct ? and why ?
Best regards
Alessandro Bolgia
I've setup RadosGW with STS ontop of my ceph cluster. It works great and fine but I'm also trying to setup authentication with an OpenIDConnect provider. I'm have a hard time troubleshooting issues because the radosgw log file doesn't have much information in it. For example when I try to use the `sts:AssumeRoleWithWebIdentity` API it fails with `{'Code': 'AccessDenied', ...}` and all I see is the beat log showing an HTTP 403.
Is there a way to enable more verbose logging so I can see what is failing and why I'm getting certain errors with STS, S3, or IAM apis?
My ceph.conf looks like this for each node (mildly redacted):
```
[client.radosgw.pve4]
host = pve4
keyring = /etc/pve/priv/ceph.client.radosgw.keyring
log file = /var/log/ceph/client.radosgw.$host.log
rgw_dns_name = s3.lab
rgw_frontends = beast endpoint=0.0.0.0:7480 ssl_endpoint=0.0.0.0:443 ssl_certificate=/etc/pve/priv/ceph/s3.lab.crt ssl_private_key=/etc/pve/priv/ceph/s3.lab.key
rgw_sts_key = 1111111111111111
rgw_s3_auth_use_sts = true
rgw_enable_apis = s3, s3website, admin, sts, iam
```
Hello
Looking to get some official guidance on PG and PGP sizing.
Is the goal to maintain approximately 100 PGs per OSD per pool or for the
cluster general?
Assume the following scenario:
Cluster with 80 OSD across 8 nodes;
3 Pools:
- Pool1 = Replicated 3x
- Pool2 = Replicated 3x
- Pool3 = Erasure Coded 6-4
Assuming the well published formula:
Let (Target PGs / OSD) = 100
[ (Target PGs / OSD) * (# of OSDs) ] / (Replica Size)
- Pool1 = (100*80)/3 = 2666.67 => 4096
- Pool2 = (100*80)/3 = 2666.67 => 4096
- Pool3 = (100*80)/10 = 800 => 1024
Total cluster would have 9216 PGs and PGPs.
Are there any implications (performance / monitor / MDS / RGW sizing) with
how many PGs are created on the cluster?
Looking for validation and / or clarification of the above.
Thank you.