June 2023 - ceph-users - lists.ceph.io

by Frank Schilder

Hi all, I have a problem with exporting 2 different sub-folder ceph-fs kernel mounts via nfsd to the same IP address. The top-level structure on the ceph fs is something like /A/S1 and /A/S2. On a file server I mount /A/S1 and /A/S2 as two different file systems under /mnt/S1 and /mnt/S2 using the ceph fs kernel client. Then, these 2 mounts are exported with lines like these in /etc/exports: /mnt/S1 -options NET /mnt/S2 -options IP IP is an element of NET, meaning that the host at IP should be the only host being able to access /mnt/S1 and /mnt/S2. What we observe is that any attempt to mount the export /mnt/S1 on the host at IP results in /mnt/S2 being mounted instead. My first guess was that here we have a clash of fsids and the ceph fs is simply reporting the same fsid to nfsd and, hence, nfsd thinks both mountpoints contain the same. So I modified the second export line to /mnt/S2 -options,fsid=100 IP to no avail. The two folders are completely disjoint, neither symlinks nor hard-links between them. So it should be safe to export these as 2 different file systems. Exporting such constructs to non-overlapping networks/IPs works as expected - even when exporting subdirs of a dir (like exporting /A/B and /A/B/C from the same file server to strictly different IPs). It seems the same-IP config that breaks expectations. Am I missing here a magic -yes-i-really-know-what-i-am-doing hack? The file server is on AlmaLinux release 8.7 (Stone Smilodon) and all ceph packages match the ceph version octopus latest of our cluster. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

11 months, 1 week

1
1
0 0

Metadata pool space usage decreases

by Nathan MALO

Hello all, I have a weird behavior on my Cephfs. On May 29th I noticed a drop of 50Tb in my data pool. It has been followed by a decrease of space usage in the metadata pool since then. From May 29th, still happening as I write, the metadata pool has lost 1Tb over the initial 1.8Tb. Regarding the number of objects, it was 8.4 Million and is now 7.8 Million. I assume that my users have deleted a lot a of files on that day (our cephfs consists a very small files of about 4Mb). But such a huge decrease in metadata pool got me really concerned, it really seems huge. I thought that maybe it was the mds lazy deletion happening but I am not sure. Does anyone have any thought about this ? Do you know more about lazy deletion ? There is not much documentation about this online. Do you know any way, command, logfile to search in order to get the current lazy deletion operations ? I noticed that the read_ops, read_bytes, write_ops and write_bytes reported by the command rados df detail are negative for the metadata pool. My cluster is running Nautilus. Any help would be appreciated, Best regards, Nate

11 months, 1 week

1
0
0 0

PGs incomplete - Data loss

by Benno Wulf

Hi guys, I'm awake since 36h and try to restore a broken ceph Pool (2 PGs incomplete) My vm are all broken. Some Boot, some Dont Boot... Also I have 5 removed disk with Data of that Pool "in my Hands" - Dont ask... So my question is it possible to restore Data of these other disks and "add" them thee others for healing? Best regards Ben

11 months, 1 week

2
1
0 0

Cluster without messenger v1, new MON still binds to port 6789

by Robert Sander

Hi, a cluster has ms_bind_msgr1 set to false in the config database. Newly created MONs still listen on port 6789 and add themselves as providing messenger v1 into the monmap. How do I change that? Shouldn't the MONs use the config for ms_bind_msgr1? Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin

11 months, 1 week

2
1
0 0

How to disable S3 ACL in radosgw

by Rasool Almasi

Hi, Is it possible to disable ACL in favor of bucket policy (on a bucket or globally)? The goal is to forbid users to use any bucket/object ACLs and only allow bucket policies. Seems there is no documentation in that regard which applies to Ceph RGW. Apology if I am sending this in the wrong mailing list. Regards, Rasool

11 months, 1 week

1
0
0 0

cephadm does not honor container_image default value

by Daniel Krambrock

Hello. I think i found a bug in cephadm/ceph orch: Redeploying a container image (tested with alertmanager) after removing a custom `mgr/cephadm/container_image_alertmanager` value, deploys the previous container image and not the default container image. I'm running `cephadm` from ubuntu 22.04 pkg 17.2.5-0ubuntu0.22.04.3 and `ceph` version 17.2.6. Here is an example. Node clrz20-08 is the node altermanager is running on, clrz20-01 the node I'm controlling ceph from: * Get alertmanager version ``` root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:v0.23.0" ``` * Set alertmanager image ``` root@clrz20-01:~# ceph config set mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager ``` * redeploy altermanager ``` root@clrz20-01:~# ceph orch redeploy alertmanager Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08' ``` * Get alertmanager version ``` root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:latest" ``` * Remove alertmanager image setting, revert to default: ``` root@clrz20-01:~# ceph config rm mgr mgr/cephadm/container_image_alertmanager root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager:v0.23.0 ``` * redeploy altermanager ``` root@clrz20-01:~# ceph orch redeploy alertmanager Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08' ``` * Get alertmanager version ``` root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:latest" ``` -> `mgr/cephadm/container_image_alertmanager` is set to `quay.io/prometheus/alertmanager:v0.23.0`, but redeploy uses `quay.io/prometheus/alertmanager:latest`. This looks like a bug. * Set alertmanager image explicitly to the default value ``` root@clrz20-01:~# ceph config set mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager:v0.23.0 root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager:v0.23.0 ``` * redeploy altermanager ``` root@clrz20-01:~# ceph orch redeploy alertmanager Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08' ``` * Get alertmanager version ``` root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:v0.23.0" ``` -> Setting `mgr/cephadm/container_image_alertmanager` to the default setting fixes the issue. Bests, Daniel

11 months, 1 week

3
3
0 0

How to specify to only build ceph-radosgw package from source?

by huy nguyen

Hi, I usually install the SRPM and then build from ceph.spec like this: rpmbuild -bb /root/rpmbuild/SPECS/ceph.spec --without ceph_test_package But it take a long time and contain many packages that I don't need. So is there a way to optimize this build process for only needed package, for example ceph-radosgw? Thanks.

11 months, 1 week

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2023