Dear people on this mailing list,
I've got the "problem" that our MAX AVAIL value increases by about
5-10 TB when I reboot a whole OSD node. After the reboot the value
goes back to normal.
I would love to know WHY.
Under normal circumstances I would ignore this behavior, but because I
am very new to the whole ceph software I would like to know why stuff
like this happens.
What I read is, that this value will be calculated by the most filled OSD.
I've set noout and norebalance while the node is offline and I unset
both values after the reboot.
We are currently on nautilus.
Cheers and thanks in advance
This means it has been applied:
# ceph osd dump -f json | jq .require_osd_release
On Mon, Feb 17, 2020 at 11:10 AM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote:
> How do you check if you issued this command in the past?
> -----Original Message-----
> To: ceph-users(a)ceph.io
> Subject: [ceph-users] Re: Excessive write load on mons after upgrade
> from 12.2.13 -> 14.2.7
> Hi Peter,
> could be a totally different problem but did you run the command "ceph
> osd require-osd-release nautilus" after the upgrade?
> We had poor performance after upgrading to nautilus and running this
> command fixed it. The same was reported by others for previous updates.
> Here is my original message regarding this issue:
> We did not observe the master election problem though.
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
On Wed, May 27, 2020 at 10:09 PM Dylan McCulloch <dmc(a)unimelb.edu.au> wrote:
> Hi all,
> The single active MDS on one of our Ceph clusters is close to running out of RAM.
> MDS total system RAM = 528GB
> MDS current free system RAM = 4GB
> mds_cache_memory_limit = 451GB
> current mds cache usage = 426GB
This mds_cache_memory_limit is way too high for the available RAM. We
normally recommend that your RAM be 150% of your cache limit but we
lack data for such large cache sizes.
> Presumably we need to reduce our mds_cache_memory_limit and/or mds_max_caps_per_client, but would like some guidance on whether it’s possible to do that safely on a live production cluster when the MDS is already pretty close to running out of RAM.
> Cluster is Luminous - 12.2.12
> Running single active MDS with two standby.
> 890 clients
> Mix of kernel client (4.19.86) and ceph-fuse.
> Clients are 12.2.12 (398) and 12.2.13 (3)
v12.2.12 has the changes necessary to throttle MDS cache size
reduction. You should be able to reduce mds_cache_memory_limit to any
lower value without destabilizing the cluster.
> The kernel clients have stayed under “mds_max_caps_per_client”: “1048576". But the ceph-fuse clients appear to hold very large numbers according to the ceph-fuse asok.
> “num_caps”: 1007144398,
> “num_caps”: 1150184586,
> “num_caps”: 1502231153,
> “num_caps”: 1714655840,
> “num_caps”: 2022826512,
This data from the ceph-fuse asok is actually the number of caps ever
received, not the current number. I've created a ticket for this:
Look at the data from `ceph tell mds.foo session ls` instead.
> Dropping caches on the clients appears to reduce their cap usage but does not free up RAM on the MDS.
The MDS won't free up RAM until the cache memory limit is reached.
> What is the safest method to free cache and reduce RAM usage on the MDS in this situation (without having to evict or remount clients)?
> I’m concerned that reducing mds_cache_memory_limit even in very small increments may trigger a large recall of caps and overwhelm the MDS.
That used to be the case in older versions of Luminous but not any longer.
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
trying to get around my head rocksdb spillovers and how to deal with
them … in particular, i have one osds which does not have any pools
associated (as per ceph pg ls-by-osd $osd ), yet it does show up in ceph
health detail as:
osd.$osd spilled over 2.9 MiB metadata from 'db' device (49 MiB
used of 37 GiB) to slow device
compaction doesn't help. i am well aware of
https://tracker.ceph.com/issues/38745 , yet find it really
counter-intuitive that an empty osd with a more-or-less optimal sized db
volume can't fit its rockdb on the former.
is there any way to repair this, apart from re-creating the osd? fwiw,
dumping the database with
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-$osd dump >
yields a file of less than 100mb in size.
and, while we're at it, a few more related questions:
- am i right to assume that the leveldb and rocksdb arguments to
ceph-kvstore-tool are only relevant for osds with filestore-backend?
- does ceph-kvstore-tool bluestore-kv … also deal with rocksdb-items for
osds with bluestore-backend?
thank you very much & with kind regards,
Hi, trying to migrate a second ceph cluster to Cephadm. All the host successfully migrated from "legacy" except one of the OSD hosts (cephadm kept duplicating osd ids e.g. two "osd.5", still not sure why). To make things easier, we re-provisioned the node (reinstalled from netinstall, applied the same SaltStack traits as the other nodes, wiped the disks) and tried to use cephadm to setup the OSD's.
So, orch correctly starts the provisioning processes (a docker container running ceph-volume is created). But the provisioning never completes (docker exec):
# ps axu
root 1 0.1 0.2 99272 22488 ? Ss 15:26 0:01 /usr/libexec/platform-python -s /usr/sbin/ceph-volume lvm batch --no-auto /dev/sdb /dev/sdc --dmcrypt --yes --no-systemd
root 807 0.9 0.5 154560 44120 ? S<L 15:26 0:06 /usr/sbin/cryptsetup --key-file - --allow-discards luksOpen /dev/ceph-851cae40-3270-45ea-b788-be6e05465e92/osd-data-e3157b54-f6b9-4ec9-ab12-e289f52c00a4 Afr6Ct-ok4h-pBEy-GfFF-xxYl-EKwi-cHhjZc
# cat /var/log/ceph/ceph-volume.log
Running command: /usr/sbin/cryptsetup --batch-mode --key-file - luksFormat /dev/ceph-851cae40-3270-45ea-b788-be6e05465e92/osd-data-e3157b54-f6b9-4ec9-ab12-e289f52c00a4
Running command: /usr/sbin/cryptsetup --key-file - --allow-discards luksOpen /dev/ceph-851cae40-3270-45ea-b788-be6e05465e92/osd-data-e3157b54-f6b9-4ec9-ab12-e289f52c00a4 Afr6Ct-ok4h-pBEy-GfFF-xxYl-EKwi-cHhjZc
# docker ps
2956dec0450d ceph/ceph:v15 "/usr/sbin/ceph-volu…" 14 minutes ago Up 14 minutes condescending_nightingale
# cat osd_spec_default.yaml
It looks like cephadm hangs on luksOpen.
Is this expected (encryption is mentioned to be supported, outside of no documentation)?
This is again about our bad cluster, with too much objects, and the hdd
OSDs have a DB device that is (much) too small (e.g. 20 GB, i.e. 3 GB
usable). Now several OSDs do not come up any more.
Typical error message:
/build/ceph-14.2.8/src/os/bluestore/BlueFS.cc: 2261: FAILED
ceph_assert(h->file->fnode.ino != 1)
Also just tried to add a few GB to the DB device (lvextend,
ceph-bluestore-tool bluefs-bdev-expand), but this also crashes, also
with this message.
Options that helped us before (thanks Wido :-) do not help here, e.g.
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-$OSD compact
Any ideas that I could try to save these OSDs?
I've 15.2.1 installed on all machines. On primary machine I executed ceph upgrade command:
$ ceph orch upgrade start --ceph-version 15.2.2
When I check ceph -s I see this:
Upgrade to docker.io/ceph/ceph:v15.2.2 (30m)
[=...........................] (remaining: 8h)
It says 8 hours. It is already ran for 3 hours. No upgrade processed. It get stuck at this point.
Is there any way to know why this has stuck?
Hi guys, I deploy an efk cluster and use ceph as block storage in kubernetes, but RBD write iops sometimes becomes zero and last for a few minutes. I want to check logs about RBD so I add some config to ceph.conf and restart ceph.
Here is my ceph.conf:
fsid = 53f4e1d5-32ce-4e9c-bf36-f6b54b009962
mon_initial_members = db-16-4-hzxs, db-16-5-hzxs, db-16-6-hzxs
mon_host = 10.25.16.4,10.25.16.5,10.25.16.6
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default size = 3
debug rbd = 20
debug rbd mirror = 20
debug rbd replay = 20
log file = /var/log/ceph/client_rbd.log
I can not get any logs in /var/log/ceph/client_rbd.log. I also try to execute 'ceph daemon osd.* config set debug_rbd 20’ and there is also no related logs in ceph-osd.log.
How can I get useful logs about this question or How can I analyze this problem? Look forward to your reply.
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// 声明：此邮件可能包含依图公司保密或特权信息，并且仅应发送至有权接收该邮件的收件人。如果您无权收取该邮件，您应当立即删除该邮件并通知发件人，您并被禁止传播、分发或复制此邮件以及附件。对于此邮件可能携带的病毒引起的任何损害，本公司不承担任何责任。此外，本公司不保证已正确和完整地传输此信息，也不接受任何延迟收件的赔偿责任。 ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// Notice: This email may contain confidential or privileged information of Yitu and was sent solely to the intended recipients. If you are unauthorized to receive this email, you should delete the email and contact the sender immediately. Any unauthorized disclosing, distribution, or copying of this email and attachment thereto is prohibited. Yitu does not accept any liability for any loss caused by possibly viruses in this email. E-mail transmission cannot be guaranteed to be secure or error-free and Yitu is not responsible for any delayed transmission.