Hi All,
I've been battling this for a while and I'm not sure where to go from
here. I have a Ceph health warning as such:
# ceph -s
cluster:
id: 58bde08a-d7ed-11ee-9098-506b4b4da440
health: HEALTH_WARN
1 MDSs report slow requests
1 MDSs behind on trimming
services:
mon: 5 daemons, quorum
pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)
mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
mds: 1/1 daemons up, 2 standby
osd: 46 osds: 46 up (since 9h), 46 in (since 2w)
data:
volumes: 1/1 healthy
pools: 4 pools, 1313 pgs
objects: 260.72M objects, 466 TiB
usage: 704 TiB used, 424 TiB / 1.1 PiB avail
pgs: 1306 active+clean
4 active+clean+scrubbing+deep
3 active+clean+scrubbing
io:
client: 123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr
And the specifics are:
# ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked >
30 secs
[WRN] MDS_TRIM: 1 MDSs behind on trimming
mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (13884/250)
max_segments: 250, num_segments: 13884
That "num_segments" number slowly keeps increasing. I suspect I just
need to tell the MDS servers to trim faster but after hours of googling
around I just can't figure out the best way to do it. The best I could
come up with was to decrease "mds_cache_trim_decay_rate" from 1.0 to .8
(to start), based on this page:
https://www.suse.com/support/kb/doc/?id=000019740
But it doesn't seem to help, maybe I should decrease it further? I am
guessing this must be a common issue...? I am running Reef on the MDS
servers, but most clients are on Quincy.
Thanks for any advice!
cheers,
erich
Hello Ceph List,
I'd like to formally let the wider community know of some work I've been
involved with for a while now: adding Managed SMB Protocol Support to Ceph.
SMB being the well known network file protocol native to Windows systems and
supported by MacOS (and Linux). The other key word "managed" meaning
integrating with Ceph management tooling - in this particular case cephadm for
orchestration and eventually a new MGR module for managing SMB shares.
The effort is still in it's very early stages. We have a PR adding initial
support for Samba Containers to cephadm [1] and a prototype for an smb MGR
module [2]. We plan on using container images based on the samba-container
project [3] - a team I am already part of. What we're aiming for is a feature
set similar to the current NFS integration in Ceph, but with a focus on
bridging non-Linux/Unix clients to CephFS using a protocol built into those
systems.
A few major features we have planned include:
* Standalone servers (internally defined users/groups)
* Active Directory Domain Member Servers
* Clustered Samba support
* Exporting Samba stats via Prometheus metrics
* A `ceph` cli workflow loosely based on the nfs mgr module
I wanted to share this information in case there's wider community interest in
this effort. I'm happy to take your questions / thoughts / suggestions in this
email thread, via Ceph slack (or IRC), or feel free to attend a Ceph
Orchestration weekly meeting! I try regularly attend and we sometimes discuss
design aspects of the smb effort there. It's on the Ceph Community Calendar.
Thanks!
[1] - https://github.com/ceph/ceph/pull/55068
[2] - https://github.com/ceph/ceph/pull/56350
[3] - https://github.com/samba-in-kubernetes/samba-container/
Thanks for reading,
--John Mulligan
I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts in it, each with 4 OSD's attached. The first 2 servers hosting mgr's have 32GB of RAM each, and the remaining have 24gb
For some reason i am unable to identify, the first host in the cluster appears to constantly be trying to set the osd_memory_target variable to roughly half of what the calculated minimum is for the cluster, i see the following spamming the logs constantly
Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing value: Value '480485376' is below minimum 939524096
Default is set to 4294967296.
I did double check and osd_memory_base (805306368) + osd_memory_cache_min (134217728) adds up to minimum exactly
osd_memory_target_autotune is currently enabled. But i cannot for the life of me figure out how it is arriving at 480485376 as a value for that particular host that even has the most RAM. Neither the cluster or the host is even approaching max utilization on memory, so it's not like there are processes competing for resources.
Hi all,
We rebooted all the nodes in our 17.2.5 cluster after performing kernel updates, but 2 of the OSDs on different nodes are not coming back up. This is a production cluster using cephadm.
The error message from the OSD log is ceph-osd[87340]: ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-665: (2) No such file or directory
The error message from ceph-volume is 2023-08-23T16:12:43.452-0500 7f0cad968600 2 bluestore(/dev/mapper/ceph--febad5a5--ba44--41aa--a39e--b9897f757752-osd--block--87e548f4--b9b5--4ed8--aca8--de703a341a50) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input
We tried restarting the daemons and rebooting the node again, but still see the same error.
Has anyone experienced this issue before? How do we fix this?
Thanks,
Alison
Hi,
A disk failed in our cephadm-managed 16.2.15 cluster, the affected OSD is
down, out and stopped with cephadm, I also removed the failed drive from
the host's service definition. The cluster has finished recovering but the
following warning persists:
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
daemon osd.11 on ceph02 is in error state
Is it possible to remove or suppress this warning without having to
completely remove the OSD?
I would appreciate any advice or pointers.
Best regards,
Zakhar
Dear ceph community,
We have trouble with new disks not being properly prepared resp. OSDs not being fully installed by cephadm.
We just added one new node each with ~40 HDDs each to two of our ceph clusters.
In one cluster all but 5 disks got installed automatically.
In the other none got installed.
We are on ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) on both clusters.
(I haven't added new disks since the last upgrade if I recall correctly).
This is our OSD service definition:
```
0|0[root@ceph-3-10 ~]# ceph orch ls osd --export
service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
host_pattern: '*'
spec:
data_devices:
all: true
filter_logic: AND
objectstore: bluestore
---
service_type: osd
service_id: unmanaged
service_name: osd.unmanaged
unmanaged: true
spec:
filter_logic: AND
objectstore: bluestore
```
Usually, new disks are installed properly (as expected due to all-available-devices).
This time, I can see that LVs were created (via `lsblk`, `lvs`, `cephadm ceph-volume lvm list`).
And OSDs are entered to the crushmap.
However, they are not assigned to a host yet, nor do they have a type or weight, e.g.:
```
0|0[root@ceph-2-10 ~]# ceph osd tree | grep "0 osd"
518 0 osd.518 down 0 1.00000
519 0 osd.519 down 0 1.00000
520 0 osd.520 down 0 1.00000
521 0 osd.521 down 0 1.00000
522 0 osd.522 down 0 1.00000
```
And there is also no OSD daemon created (no docker container).
So, OSD creation is somehow stuck halfway.
I thought of fully cleaning up the OSD/disks.
Hopping cephadm might pick them up properly next time.
Just zapping was not possible, e.g. `cephadm ceph-volume lvm zap --destroy /dev/sdab` results in these errors:
```
/usr/bin/docker: stderr stderr: wipefs: error: /dev/sdab: probing initialization failed: Device or resource busy
/usr/bin/docker: stderr --> failed to wipefs device, will try again to workaround probable race condition
```
So, I cleaned up more manually with purging them from crush and "resetting" disk and LV with dd and dmsetup, resp.:
```
ceph osd purge 480 --force
dd if=/dev/zero of=/dev/sdab bs=1M count=1
dmsetup remove ceph--e10e0f08--8705--441a--8caa--4590de22a611-osd--block--d464211c--f513--4513--86c1--c7ad63e6c142
```
ceph-volume still reported the old volumes, but then zapping actually got rid of them (only cleaned out the left-over entries, I guess).
Now, cephadm was able to get one OSD up, when I did this cleanup for only one disk.
When I did it in bulk for the rest, they all got stuck again the same way.
Looking into ceph-volume logs (here for osd.522 as representative):
```
0|0[root@ceph-2-11 /var/log/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f]# ll *20240316
-rw-r--r-- 1 ceph ceph 613789 Mar 14 17:10 ceph-osd.522.log-20240316
-rw-r--r-- 1 root root 42473553 Mar 16 03:13 ceph-volume.log-20240316
```
ceph-volume only reports keyring creation:
```
[2024-03-14 16:10:19,509][ceph_volume.util.prepare][INFO ] Creating keyring file for osd.522
[2024-03-14 16:10:19,510][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-522/keyring --create-keyring --name osd.522 --add-key AQBfIfNlinc7EBAAHeFicrjmLEjRPGSjuFuLiQ==
```
In the OSD logs I found a couple of these, but don't know if they are related:
```
2024-03-14T16:10:54.706+0000 7fab26988540 2 rocksdb: [db/column_family.cc:546] Failed to register data paths of column family (id: 11, name: P)
```
Has anyone seen this behaviour before?
Or could tell me where I should look next to troubleshoot this (which logs)?
Any help is appreciated.
Best Wishes,
Mathias
Hi,
I am upgrading my test cluster from 17.2.6 (quincy) to 18.2.2 (reef).
As it was an rpm install, i am following the directions here:
Reef — Ceph Documentation
|
|
| |
Reef — Ceph Documentation
|
|
|
The upgrade worked, but I have some observations and questions before I move to my production cluster:
1. I see no systemd units with the fsid in them, as described in the document above. Both before and after the upgrade, my mon and other units are:
ceph-mon@<server>.serviceceph-osd(a)[N].service
etc
Should I be concerned?
2. Does order matter? Based on past upgrades, I do not think so, but I wanted to be sure. For example, can I update:
mon/mds/radosgw/mgrs first, then afterwards update the osds? This is what i have done in previous updates and and all was well.
3. Again on order, if a server serves say, a mon and mds, I can't really easily update one without the other, based on shared libraries and such.
It appears that that is ok, based on my test cluster, but wanted to be sure. Again if an mds is one of the servers to update, I know I have to updatethe remaining one after max_mds is set to 1 and others are stopped, first.
4. After upgrade of my mgr node I get:
"Module [several module names] has missing NOTIFY_TYPES member"
in ceph-mgr.<server>.log
But the mgr starts up eventually
The system is Rocky Linux 8.9
Thanks for any thoughts
-Chris
Running on Octopus:
While attempting to install a bunch of new OSDs on multiple hosts, I ran some ceph orchestrator commands to install them, such as
ceph orch apply osd --all-available-devices
ceph orch apply osd -I HDD_drive_group.yaml
I assumed these were just helper processes, and they would be short-lived. In fact, they didn’t actually work and I ended up installing each drive by hand like this:
ceph orch daemon add osd ceph4.iri.columbia.edu:/dev/sdag
However, now I have these services running:
# ceph orch ls --service-type=osd
NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID
osd.HDD_drive_group 2/2 7m ago 6w ceph[456].iri.columbia.edu docker.io/ceph/ceph:v15 2cf504fded39
osd.None 54/0 7m ago - <unmanaged> docker.io/ceph/ceph:v15 2cf504fded39
osd.all-available-devices 1/0 7m ago - <unmanaged> docker.io/ceph/ceph:v15 2cf504fded39
I’m certain none of these actually created any of my running OSD daemons, but I’m not sure if it’s ok to remove them.
For example:
ceph orch daemon rm osd.all-available-devices
ceph orch daemon rm osd.HDD_drive_group
ceph orch daemon rm osd.None
Does anyone have any insight to this? I can just leave them there, they don’t seem to be doing anything, but on the other hand, I don’t want any new devices to be automatically loaded or any other unintended consequences of these.
Thanks for any guidance,
Jeff Turmelle
International Research Institute for Climate & Society <https://iri.columbia.edu/>
The Climate School <https://climate.columbia.edu/> at Columbia University <https://columbia.edu/>
Hello everyone,
we are facing a problem regarding the s3 operation put bucket
notification configuration.
We are using Ceph version 17.2.6. We are trying to configure buckets in
our cluster so that a notification message is sent via amqps protocol
when the content of the bucket change. To do so, we created a local rgw
user with "special" capabilities and we wrote ad hoc policies for this
user (list of all buckets, read access to all buckets and possibility to
add, list and delete bucket configurations).
The problems regards the configurations of all buckets except the one he
owns, when doing this put bucket notification configuration
cross-account operation we get an access denied error.
I have the suspect that this problem is related to the version we are
using, because when we were doing tests on another cluster we were using
version 18.2.1 and we did not face this problem. Can you confirm my
hypothesis?
Thanks,
GM.
good afternoon,
i am trying to set bucket policies to allow to different users to access
same bucket with different permissions, BUT it seems that is not yet
supported, am i wrong?
https://docs.ceph.com/en/reef/radosgw/bucketpolicy/#limitations
"We do not yet support setting policies on users, groups, or roles."
thank you.