March 2024 - ceph-users - lists.ceph.io

by Erich Weiler

Hi All, I've been battling this for a while and I'm not sure where to go from here. I have a Ceph health warning as such: # ceph -s cluster: id: 58bde08a-d7ed-11ee-9098-506b4b4da440 health: HEALTH_WARN 1 MDSs report slow requests 1 MDSs behind on trimming services: mon: 5 daemons, quorum pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d) mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz mds: 1/1 daemons up, 2 standby osd: 46 osds: 46 up (since 9h), 46 in (since 2w) data: volumes: 1/1 healthy pools: 4 pools, 1313 pgs objects: 260.72M objects, 466 TiB usage: 704 TiB used, 424 TiB / 1.1 PiB avail pgs: 1306 active+clean 4 active+clean+scrubbing+deep 3 active+clean+scrubbing io: client: 123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr And the specifics are: # ceph health detail HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked > 30 secs [WRN] MDS_TRIM: 1 MDSs behind on trimming mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (13884/250) max_segments: 250, num_segments: 13884 That "num_segments" number slowly keeps increasing. I suspect I just need to tell the MDS servers to trim faster but after hours of googling around I just can't figure out the best way to do it. The best I could come up with was to decrease "mds_cache_trim_decay_rate" from 1.0 to .8 (to start), based on this page: https://www.suse.com/support/kb/doc/?id=000019740 But it doesn't seem to help, maybe I should decrease it further? I am guessing this must be a common issue...? I am running Reef on the MDS servers, but most clients are on Quincy. Thanks for any advice! cheers, erich

3 weeks, 3 days

5
24
0 0

Call for Interest: Managed SMB Protocol Support

by John Mulligan

Hello Ceph List, I'd like to formally let the wider community know of some work I've been involved with for a while now: adding Managed SMB Protocol Support to Ceph. SMB being the well known network file protocol native to Windows systems and supported by MacOS (and Linux). The other key word "managed" meaning integrating with Ceph management tooling - in this particular case cephadm for orchestration and eventually a new MGR module for managing SMB shares. The effort is still in it's very early stages. We have a PR adding initial support for Samba Containers to cephadm [1] and a prototype for an smb MGR module [2]. We plan on using container images based on the samba-container project [3] - a team I am already part of. What we're aiming for is a feature set similar to the current NFS integration in Ceph, but with a focus on bridging non-Linux/Unix clients to CephFS using a protocol built into those systems. A few major features we have planned include: * Standalone servers (internally defined users/groups) * Active Directory Domain Member Servers * Clustered Samba support * Exporting Samba stats via Prometheus metrics * A `ceph` cli workflow loosely based on the nfs mgr module I wanted to share this information in case there's wider community interest in this effort. I'm happy to take your questions / thoughts / suggestions in this email thread, via Ceph slack (or IRC), or feel free to attend a Ceph Orchestration weekly meeting! I try regularly attend and we sometimes discuss design aspects of the smb effort there. It's on the Ceph Community Calendar. Thanks! [1] - https://github.com/ceph/ceph/pull/55068 [2] - https://github.com/ceph/ceph/pull/56350 [3] - https://github.com/samba-in-kubernetes/samba-container/ Thanks for reading, --John Mulligan

3 weeks, 6 days

7
16
0 0

Cephadm host keeps trying to set osd_memory_target to less than minimum

by mads2a＠gmail.com

I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts in it, each with 4 OSD's attached. The first 2 servers hosting mgr's have 32GB of RAM each, and the remaining have 24gb For some reason i am unable to identify, the first host in the cluster appears to constantly be trying to set the osd_memory_target variable to roughly half of what the calculated minimum is for the cluster, i see the following spamming the logs constantly Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing value: Value '480485376' is below minimum 939524096 Default is set to 4294967296. I did double check and osd_memory_base (805306368) + osd_memory_cache_min (134217728) adds up to minimum exactly osd_memory_target_autotune is currently enabled. But i cannot for the life of me figure out how it is arriving at 480485376 as a value for that particular host that even has the most RAM. Neither the cluster or the host is even approaching max utilization on memory, so it's not like there are processes competing for resources.

3 weeks, 6 days

3
11
0 0

A couple OSDs not starting after host reboot

by Alison Peisker

Hi all, We rebooted all the nodes in our 17.2.5 cluster after performing kernel updates, but 2 of the OSDs on different nodes are not coming back up. This is a production cluster using cephadm. The error message from the OSD log is ceph-osd[87340]: ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-665: (2) No such file or directory The error message from ceph-volume is 2023-08-23T16:12:43.452-0500 7f0cad968600 2 bluestore(/dev/mapper/ceph--febad5a5--ba44--41aa--a39e--b9897f757752-osd--block--87e548f4--b9b5--4ed8--aca8--de703a341a50) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input We tried restarting the daemons and rebooting the node again, but still see the same error. Has anyone experienced this issue before? How do we fix this? Thanks, Alison

4 weeks

9
13
0 0

cephadm: daemon osd.x on yyy is in error state

by Zakhar Kirpichenko

Hi, A disk failed in our cephadm-managed 16.2.15 cluster, the affected OSD is down, out and stopped with cephadm, I also removed the failed drive from the host's service definition. The cluster has finished recovering but the following warning persists: [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) daemon osd.11 on ceph02 is in error state Is it possible to remove or suppress this warning without having to completely remove the OSD? I would appreciate any advice or pointers. Best regards, Zakhar

1 month

2
3
0 0

cephadm auto disk preparation and OSD installation incomplete

by Kuhring, Mathias

Dear ceph community, We have trouble with new disks not being properly prepared resp. OSDs not being fully installed by cephadm. We just added one new node each with ~40 HDDs each to two of our ceph clusters. In one cluster all but 5 disks got installed automatically. In the other none got installed. We are on ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) on both clusters. (I haven't added new disks since the last upgrade if I recall correctly). This is our OSD service definition: ``` 0|0[root@ceph-3-10 ~]# ceph orch ls osd --export service_type: osd service_id: all-available-devices service_name: osd.all-available-devices placement: host_pattern: '*' spec: data_devices: all: true filter_logic: AND objectstore: bluestore --- service_type: osd service_id: unmanaged service_name: osd.unmanaged unmanaged: true spec: filter_logic: AND objectstore: bluestore ``` Usually, new disks are installed properly (as expected due to all-available-devices). This time, I can see that LVs were created (via `lsblk`, `lvs`, `cephadm ceph-volume lvm list`). And OSDs are entered to the crushmap. However, they are not assigned to a host yet, nor do they have a type or weight, e.g.: ``` 0|0[root@ceph-2-10 ~]# ceph osd tree | grep "0 osd" 518 0 osd.518 down 0 1.00000 519 0 osd.519 down 0 1.00000 520 0 osd.520 down 0 1.00000 521 0 osd.521 down 0 1.00000 522 0 osd.522 down 0 1.00000 ``` And there is also no OSD daemon created (no docker container). So, OSD creation is somehow stuck halfway. I thought of fully cleaning up the OSD/disks. Hopping cephadm might pick them up properly next time. Just zapping was not possible, e.g. `cephadm ceph-volume lvm zap --destroy /dev/sdab` results in these errors: ``` /usr/bin/docker: stderr stderr: wipefs: error: /dev/sdab: probing initialization failed: Device or resource busy /usr/bin/docker: stderr --> failed to wipefs device, will try again to workaround probable race condition ``` So, I cleaned up more manually with purging them from crush and "resetting" disk and LV with dd and dmsetup, resp.: ``` ceph osd purge 480 --force dd if=/dev/zero of=/dev/sdab bs=1M count=1 dmsetup remove ceph--e10e0f08--8705--441a--8caa--4590de22a611-osd--block--d464211c--f513--4513--86c1--c7ad63e6c142 ``` ceph-volume still reported the old volumes, but then zapping actually got rid of them (only cleaned out the left-over entries, I guess). Now, cephadm was able to get one OSD up, when I did this cleanup for only one disk. When I did it in bulk for the rest, they all got stuck again the same way. Looking into ceph-volume logs (here for osd.522 as representative): ``` 0|0[root@ceph-2-11 /var/log/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f]# ll *20240316 -rw-r--r-- 1 ceph ceph 613789 Mar 14 17:10 ceph-osd.522.log-20240316 -rw-r--r-- 1 root root 42473553 Mar 16 03:13 ceph-volume.log-20240316 ``` ceph-volume only reports keyring creation: ``` [2024-03-14 16:10:19,509][ceph_volume.util.prepare][INFO ] Creating keyring file for osd.522 [2024-03-14 16:10:19,510][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-522/keyring --create-keyring --name osd.522 --add-key AQBfIfNlinc7EBAAHeFicrjmLEjRPGSjuFuLiQ== ``` In the OSD logs I found a couple of these, but don't know if they are related: ``` 2024-03-14T16:10:54.706+0000 7fab26988540 2 rocksdb: [db/column_family.cc:546] Failed to register data paths of column family (id: 11, name: P) ``` Has anyone seen this behaviour before? Or could tell me where I should look next to troubleshoot this (which logs)? Any help is appreciated. Best Wishes, Mathias

1 month

2
4
0 0

quincy-> reef upgrade non-cephadm

by Christopher Durham

Hi, I am upgrading my test cluster from 17.2.6 (quincy) to 18.2.2 (reef). As it was an rpm install, i am following the directions here: Reef — Ceph Documentation | | | | Reef — Ceph Documentation | | | The upgrade worked, but I have some observations and questions before I move to my production cluster: 1. I see no systemd units with the fsid in them, as described in the document above. Both before and after the upgrade, my mon and other units are: ceph-mon@<server>.serviceceph-osd(a)[N].service etc Should I be concerned? 2. Does order matter? Based on past upgrades, I do not think so, but I wanted to be sure. For example, can I update: mon/mds/radosgw/mgrs first, then afterwards update the osds? This is what i have done in previous updates and and all was well. 3. Again on order, if a server serves say, a mon and mds, I can't really easily update one without the other, based on shared libraries and such. It appears that that is ok, based on my test cluster, but wanted to be sure. Again if an mds is one of the servers to update, I know I have to updatethe remaining one after max_mds is set to 1 and others are stopped, first. 4. After upgrade of my mgr node I get: "Module [several module names] has missing NOTIFY_TYPES member" in ceph-mgr.<server>.log But the mgr starts up eventually The system is Rocky Linux 8.9 Thanks for any thoughts -Chris

1 month

2
1
0 0

ceph orchestrator for osds

by Jeffrey Turmelle

Running on Octopus: While attempting to install a bunch of new OSDs on multiple hosts, I ran some ceph orchestrator commands to install them, such as ceph orch apply osd --all-available-devices ceph orch apply osd -I HDD_drive_group.yaml I assumed these were just helper processes, and they would be short-lived. In fact, they didn’t actually work and I ended up installing each drive by hand like this: ceph orch daemon add osd ceph4.iri.columbia.edu:/dev/sdag However, now I have these services running: # ceph orch ls --service-type=osd NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID osd.HDD_drive_group 2/2 7m ago 6w ceph[456].iri.columbia.edu docker.io/ceph/ceph:v15 2cf504fded39 osd.None 54/0 7m ago - <unmanaged> docker.io/ceph/ceph:v15 2cf504fded39 osd.all-available-devices 1/0 7m ago - <unmanaged> docker.io/ceph/ceph:v15 2cf504fded39 I’m certain none of these actually created any of my running OSD daemons, but I’m not sure if it’s ok to remove them. For example: ceph orch daemon rm osd.all-available-devices ceph orch daemon rm osd.HDD_drive_group ceph orch daemon rm osd.None Does anyone have any insight to this? I can just leave them there, they don’t seem to be doing anything, but on the other hand, I don’t want any new devices to be automatically loaded or any other unintended consequences of these. Thanks for any guidance, Jeff Turmelle International Research Institute for Climate & Society <https://iri.columbia.edu/> The Climate School <https://climate.columbia.edu/> at Columbia University <https://columbia.edu/>

1 month

2
1
0 0

put bucket notification configuration - access denied

by Giada Malatesta

Hello everyone, we are facing a problem regarding the s3 operation put bucket notification configuration. We are using Ceph version 17.2.6. We are trying to configure buckets in our cluster so that a notification message is sent via amqps protocol when the content of the bucket change. To do so, we created a local rgw user with "special" capabilities and we wrote ad hoc policies for this user (list of all buckets, read access to all buckets and possibility to add, list and delete bucket configurations). The problems regards the configurations of all buckets except the one he owns, when doing this put bucket notification configuration cross-account operation we get an access denied error. I have the suspect that this problem is related to the version we are using, because when we were doing tests on another cluster we were using version 18.2.1 and we did not face this problem. Can you confirm my hypothesis? Thanks, GM.

1 month

2
1
0 0

rgw s3 bucket policies limitations (on users)

by garcetto

good afternoon, i am trying to set bucket policies to allow to different users to access same bucket with different permissions, BUT it seems that is not yet supported, am i wrong? https://docs.ceph.com/en/reef/radosgw/bucketpolicy/#limitations "We do not yet support setting policies on users, groups, or roles." thank you.

1 month

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2024