May 2023 - ceph-users - lists.ceph.io

by Brad House

I'm in the process of exploring if it is worthwhile to add RadosGW to our existing ceph cluster. We've had a few internal requests for exposing the S3 API for some of our business units, right now we just use the ceph cluster for VM disk image storage via RBD. Everything looks pretty straight forward until we hit multitenancy. The page on multi-tenancy doesn't dive into permission delegation: https://docs.ceph.com/en/quincy/radosgw/multitenancy/ The end goal I want is to be able to create a single user per tenant (Business Unit) which will act as their 'administrator', where they can then do basically whatever they want under their tenant sandbox (though I don't think we need more advanced cases like creations of roles or policies, just create/delete their own users, buckets, objects). I was hopeful this would just work, and I asked on the ceph IRC channel on OFTC and was told once I grant a user caps="users=*", they would then be allowed to create users *outside* of their own tenant using the Rados Admin API and that I should explore IAM roles. I think it would make sense to add a feature, such as a flag that can be set on a user, to ensure they stay in their "sandbox". I'd assume this is probably a common use-case. Anyhow, if its possible to do today using iam roles/policies, then great, unfortunately this is my first time looking at this stuff and there are some things not immediately obvious. I saw this online about AWS itself and creating a permissions boundary, but that's for allowing creation of roles within a boundary: https://www.qloudx.com/delegate-aws-iam-user-and-role-creation-without-givi… I'm not sure what "Action" is associated with the Rados Admin API create user for applying a boundary that the user can only create users with the same tenant name. https://docs.ceph.com/en/quincy/radosgw/adminops/#create-user Any guidance on this would be extremely helpful. Thanks! -Brad

11 months

1
1
0 0

slow mds requests with random read test

by Ben

Hi, We are performing couple performance tests on CephFS using fio. fio is run in k8s pod and 3 pods will be up running mounting the same pvc to CephFS volume. Here is command line for random read: fio -direct=1 -iodepth=128 -rw=randread -ioengine=libaio -bs=4k -size=1G -numjobs=5 -runtime=500 -group_reporting -directory=/tmp/cache -name=Rand_Read_Testing_$BUILD_TIMESTAMP The random read is performed very slow. Here is the cluster log from dashboard: 5/30/23 8:13:16 PM [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests) 5/30/23 8:13:16 PM [INF] Health check cleared: MDS_SLOW_METADATA_IO (was: 1 MDSs report slow metadata IOs) 5/30/23 8:13:16 PM [INF] MDS health message cleared (mds.?): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 33 secs 5/30/23 8:13:16 PM [INF] MDS health message cleared (mds.?): 1 slow requests are blocked > 30 secs 5/30/23 8:13:14 PM [WRN] Health check update: 2 MDSs report slow requests (MDS_SLOW_REQUEST) 5/30/23 8:13:13 PM [INF] MDS health message cleared (mds.?): 1 slow requests are blocked > 30 secs 5/30/23 8:13:08 PM [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST) 5/30/23 8:13:08 PM [WRN] Health check failed: 1 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO) 5/30/23 8:13:08 PM [WRN] slow request 34.213327 seconds old, received at 2023-05-30T12:12:33.951399+0000: client_request(client.270564:1406144 getattr pAsLsXsFs #0x700000103d0 2023-05-30T12:12:33.947323+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting 5/30/23 8:13:08 PM [WRN] 1 slow requests, 1 included below; oldest blocked for > 34.213328 secs 5/30/23 8:13:07 PM [WRN] slow request 33.169703 seconds old, received at 2023-05-30T12:12:33.952078+0000: peer_request:client.270564:1406144 currently dispatched 5/30/23 8:13:07 PM [WRN] 1 slow requests, 1 included below; oldest blocked for > 33.169704 secs 5/30/23 8:13:04 PM [INF] Cluster is now healthy 5/30/23 8:13:04 PM [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests) 5/30/23 8:13:04 PM [INF] Health check cleared: MDS_SLOW_METADATA_IO (was: 1 MDSs report slow metadata IOs) 5/30/23 8:13:04 PM [INF] MDS health message cleared (mds.?): 9 slow metadata IOs are blocked > 30 secs, oldest blocked for 45 secs 5/30/23 8:13:04 PM [INF] MDS health message cleared (mds.?): 2 slow requests are blocked > 30 secs 5/30/23 8:12:57 PM [WRN] 2 slow requests, 0 included below; oldest blocked for > 44.954377 secs 5/30/23 8:12:52 PM [WRN] 2 slow requests, 0 included below; oldest blocked for > 39.954313 secs 5/30/23 8:12:48 PM [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST) 5/30/23 8:12:47 PM [WRN] slow request 34.935921 seconds old, received at 2023-05-30T12:12:12.185614+0000: client_request(client.270564:1406139 create #0x7000001045b/atomic7966567911433736706tmp 2023-05-30T12:12:12.182999+0000 caller_uid=0, caller_gid=0{}) currently submit entry: journal_and_reply 5/30/23 8:12:47 PM [WRN] slow request 34.954254 seconds old, received at 2023-05-30T12:12:12.167281+0000: client_request(client.270564:1406138 rename #0x70000010457/build.xml #0x70000010457/atomic6590865221269854506tmp 2023-05-30T12:12:12.162999+0000 caller_uid=0, caller_gid=0{}) currently submit entry: journal_and_reply 5/30/23 8:12:47 PM [WRN] 2 slow requests, 2 included below; oldest blocked for > 34.954254 secs 5/30/23 8:12:44 PM [WRN] Health check failed: 1 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO) 5/30/23 8:12:41 PM [INF] Cluster is now healthy 5/30/23 8:12:41 PM [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests) 5/30/23 8:12:41 PM [INF] MDS health message cleared (mds.?): 1 slow requests are blocked > 30 secs 5/30/23 8:12:40 PM [INF] Health check cleared: MDS_SLOW_METADATA_IO (was: 1 MDSs report slow metadata IOs) 5/30/23 8:12:40 PM [INF] MDS health message cleared (mds.?): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 38 secs However, random write test is performing very good. Any suggestions on the problem? Thanks, Ben

11 months

2
3
0 0

Orchestration seems not to work

by Thomas Widhalm

Hi, I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but the following problem existed when I was still everywhere on 17.2.5 . I had a major issue in my cluster which could be solved with a lot of your help and even more trial and error. Right now it seems that most is already fixed but I can't rule out that there's still some problem hidden. The very issue I'm asking about started during the repair. When I want to orchestrate the cluster, it logs the command but it doesn't do anything. No matter if I use ceph dashboard or "ceph orch" in "cephadm shell". I don't get any error message when I try to deploy new services, redeploy them etc. The log only says "scheduled" and that's it. Same when I change placement rules. Usually I use tags. But since they don't work anymore, too, I tried host and umanaged. No success. The only way I can actually start and stop containers is via systemctl from the host itself. When I run "ceph orch ls" or "ceph orch ps" I see services I deployed for testing being deleted (for weeks now). Ans especially a lot of old MDS are listed as "error" or "starting". The list doesn't match reality at all because I had to start them by hand. I tried "ceph mgr fail" and even a complete shutdown of the whole cluster with all nodes including all mgs, mds even osd - everything during a maintenance window. Didn't change anything. Could you help me? To be honest I'm still rather new to Ceph and since I didn't find anything in the logs that caught my eye I would be thankful for hints how to debug. Cheers, Thomas -- http://www.widhalm.or.at GnuPG : 6265BAE6 , A84CB603 Threema: H7AV7D33 Telegram, Signal: widhalmt(a)widhalm.or.at

11 months

3
19
0 0

Encryption per user Howto

by huxiaoyu＠horebdata.cn

Dear Ceph folks, Recently one of our clients approached us with a request on encrpytion per user, i.e. using individual encrytion key for each user and encryption files and object store. Does anyone know (or have experience) how to do with CephFS and Ceph RGW? Any suggestionns or comments are highly appreciated, best regards, Samuel huxiaoyu(a)horebdata.cn

11 months

8
27
0 0

reef v18.1.0 QE Validation status

by Yuri Weinstein

Details of this release are summarized here: https://tracker.ceph.com/issues/61515#note-1 Release Notes - TBD Seeking approvals/reviews for: rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to merge https://github.com/ceph/ceph/pull/51788 for the core) rgw - Casey fs - Venky orch - Adam King rbd - Ilya krbd - Ilya upgrade/octopus-x - deprecated upgrade/pacific-x - known issues, Ilya, Laura? upgrade/reef-p2p - N/A clients upgrades - not run yet powercycle - Brad ceph-volume - in progress Please reply to this email with approval and/or trackers of known issues/PRs to address them. gibba upgrade was done and will need to be done again this week. LRC upgrade TBD TIA

11 months

7
9
0 0

Unexpected behavior of directory mtime after being set explicitly

by Sandip Divekar

Hi Team, I'm writing to bring to your attention an issue we have encountered with the "mtime" (modification time) behavior for directories in the Ceph filesystem. Upon observation, we have noticed that when the mtime of a directory (let's say: dir1) is explicitly changed in CephFS, subsequent additions of files or directories within 'dir1' fail to update the directory's mtime as expected. This behavior appears to be specific to CephFS - we have reproduced this issue on both Quincy and Pacific. Similar steps work as expected in the ext4 filesystem amongst others. Reproduction steps: 1. Create a directory - mkdir dir1 2. Modify mtime using the touch command - touch dir1 3. Create a file or directory inside of 'dir1' - mkdir dir1/dir2 Expected result: mtime for dir1 should change to the time the file or directory was created in step 3 Actual result: there was no change to the mtime for 'dir1' Note : For more detail, kindly find the attached logs. Our queries are : 1. Is this expected behavior for CephFS? 2. If so, can you explain why the directory behavior is inconsistent depending on whether the mtime for the directory has previously been manually updated. Best Regards, Sandip Divekar Component QA Lead SDET.

11 months

7
14
0 0

Re: CEPH Version choice

by Frank Schilder

Hi Marc, I uploaded all scripts and a rudimentary readme to https://github.com/frans42/cephfs-bench . I hope it is sufficient to get started. I'm afraid its very much tailored to our deployment and I can't make it fully configurable anytime soon. I hope it serves a purpose though - at least I discovered a few bugs with it. We actually kept the benchmark running through an upgrade from mimic to octopus. Was quite interesting to see how certain performance properties change with that. This benchmark makes it possible to compare versions with live timings coming in. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Marc <Marc(a)f1-outsourcing.eu> Sent: Monday, May 15, 2023 11:28 PM To: Frank Schilder Subject: RE: [ceph-users] Re: CEPH Version choice > I planned to put it on-line. The hold-back is that the main test is un- > taring a nasty archive and this archive might contain personal > information, so I can't just upload it as is. I can try to put together > a similar archive from public sources. Please give me a bit of time. I'm > also a bit under stress right now with our users being hit by an FS meta > data corruption. That's also why I'm a bit trigger happy. > Ok thanks, very nice, no hurry!!!

11 months

2
2
0 0

NFS export of 2 disjoint sub-dir mounts

by Frank Schilder

Hi all, I have a problem with exporting 2 different sub-folder ceph-fs kernel mounts via nfsd to the same IP address. The top-level structure on the ceph fs is something like /A/S1 and /A/S2. On a file server I mount /A/S1 and /A/S2 as two different file systems under /mnt/S1 and /mnt/S2 using the ceph fs kernel client. Then, these 2 mounts are exported with lines like these in /etc/exports: /mnt/S1 -options NET /mnt/S2 -options IP IP is an element of NET, meaning that the host at IP should be the only host being able to access /mnt/S1 and /mnt/S2. What we observe is that any attempt to mount the export /mnt/S1 on the host at IP results in /mnt/S2 being mounted instead. My first guess was that here we have a clash of fsids and the ceph fs is simply reporting the same fsid to nfsd and, hence, nfsd thinks both mountpoints contain the same. So I modified the second export line to /mnt/S2 -options,fsid=100 IP to no avail. The two folders are completely disjoint, neither symlinks nor hard-links between them. So it should be safe to export these as 2 different file systems. Exporting such constructs to non-overlapping networks/IPs works as expected - even when exporting subdirs of a dir (like exporting /A/B and /A/B/C from the same file server to strictly different IPs). It seems the same-IP config that breaks expectations. Am I missing here a magic -yes-i-really-know-what-i-am-doing hack? The file server is on AlmaLinux release 8.7 (Stone Smilodon) and all ceph packages match the ceph version octopus latest of our cluster. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

11 months

1
1
0 0

PGs incomplete - Data loss

by Benno Wulf

Hi guys, I'm awake since 36h and try to restore a broken ceph Pool (2 PGs incomplete) My vm are all broken. Some Boot, some Dont Boot... Also I have 5 removed disk with Data of that Pool "in my Hands" - Dont ask... So my question is it possible to restore Data of these other disks and "add" them thee others for healing? Best regards Ben

11 months, 1 week

2
1
0 0

cephadm does not honor container_image default value

by Daniel Krambrock

Hello. I think i found a bug in cephadm/ceph orch: Redeploying a container image (tested with alertmanager) after removing a custom `mgr/cephadm/container_image_alertmanager` value, deploys the previous container image and not the default container image. I'm running `cephadm` from ubuntu 22.04 pkg 17.2.5-0ubuntu0.22.04.3 and `ceph` version 17.2.6. Here is an example. Node clrz20-08 is the node altermanager is running on, clrz20-01 the node I'm controlling ceph from: * Get alertmanager version ``` root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:v0.23.0" ``` * Set alertmanager image ``` root@clrz20-01:~# ceph config set mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager ``` * redeploy altermanager ``` root@clrz20-01:~# ceph orch redeploy alertmanager Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08' ``` * Get alertmanager version ``` root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:latest" ``` * Remove alertmanager image setting, revert to default: ``` root@clrz20-01:~# ceph config rm mgr mgr/cephadm/container_image_alertmanager root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager:v0.23.0 ``` * redeploy altermanager ``` root@clrz20-01:~# ceph orch redeploy alertmanager Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08' ``` * Get alertmanager version ``` root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:latest" ``` -> `mgr/cephadm/container_image_alertmanager` is set to `quay.io/prometheus/alertmanager:v0.23.0`, but redeploy uses `quay.io/prometheus/alertmanager:latest`. This looks like a bug. * Set alertmanager image explicitly to the default value ``` root@clrz20-01:~# ceph config set mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager:v0.23.0 root@clrz20-01:~# ceph config get mgr mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager:v0.23.0 ``` * redeploy altermanager ``` root@clrz20-01:~# ceph orch redeploy alertmanager Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08' ``` * Get alertmanager version ``` root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == "alertmanager")| .container_image_name' "quay.io/prometheus/alertmanager:v0.23.0" ``` -> Setting `mgr/cephadm/container_image_alertmanager` to the default setting fixes the issue. Bests, Daniel

11 months, 1 week

3
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2023