November 2023 - ceph-users

Radosgw object stat olh object attrs what does it mean.

by Selcuk Gultekin

I'd like to discuss the questions I should ask to understand the values under the 'attrs' of an object in the following JSON data structure and evaluate the health of these objects: I have a sample json output, can you comment on the object state here? { "name": "$image.name", "size": 0, "tag": "", "attrs": { "user.rgw.manifest": "", "user.rgw.olh.idtag": "$tag.uuid", "user.rgw.olh.info": "\u0001\u0001�", "user.rgw.olh.ver": "4" } } What is the purpose of this fields? "user.rgw.manifest" "user.rgw.olh.idtag" "user.rgw.olh.info" "user.rgw.olh.ver" What does the empty value "", signify in the context of the object? How does the absence of value in this field affect the object's health? How is the content of this field generated? (For example, what does the "$tag" value represent?) What is the function of this field? What information does the content of this field carry about the object's status? What does the content of this field signify? (For instance, what does "4" represent?) Does this field represent the object's version? What are the distinguishing features that set this object apart from previous versions?

6 months, 1 week

1
0
0 0

6.5 CephFS client - ceph_cap_reclaim_work [ceph] / ceph_con_workfn [libceph] hogged CPU

by Stefan Kooman

Hi, Since the 6.5 kernel addressed the issue with regards to regression in the readahead handling code... we went ahead and installed this kernel for a couple of mail / web clusters (Ubuntu 6.5.1-060501-generic #202309020842 SMP PREEMPT_DYNAMIC Sat Sep 2 08:48:34 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux). Since then we occasionally see the following being logged by the kernel: [Sun Sep 10 07:19:00 2023] workqueue: delayed_work [ceph] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND [Sun Sep 10 08:41:24 2023] workqueue: ceph_con_workfn [libceph] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND [Sun Sep 10 11:05:55 2023] workqueue: delayed_work [ceph] hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND [Sun Sep 10 12:54:38 2023] workqueue: ceph_con_workfn [libceph] hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND [Sun Sep 10 19:06:37 2023] workqueue: ceph_con_workfn [libceph] hogged CPU for >10000us 16 times, consider switching to WQ_UNBOUND [Mon Sep 11 10:53:33 2023] workqueue: ceph_con_workfn [libceph] hogged CPU for >10000us 32 times, consider switching to WQ_UNBOUND [Tue Sep 12 10:14:03 2023] workqueue: ceph_con_workfn [libceph] hogged CPU for >10000us 64 times, consider switching to WQ_UNBOUND [Tue Sep 12 11:14:33 2023] workqueue: ceph_cap_reclaim_work [ceph] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND We wonder if this is a new phenomenon, or that it's rather logged in the new kernel and it was not before. However, we have hit a few OOM situations since we switched to the new kernel because of ceph_cap_reclaim_work events (OOM is because Apache threads keep piling up as it cannot access CephFS). We then also see MDS slow ops reported. This might be related to a backup job that is running on a backup server. We did not observe this behavior on 5.12.19 kernel. Ceph cluster is on 16.2.11 currently. Anyone has some insight on this? Thanks, Stefan

6 months, 1 week

3
12
0 0

how to disable ceph version check?

by zxcs

Hi, Experts, we have a ceph cluster report HEALTH_ERR due to multiple old versions. health: HEALTH_ERR There are daemons running multiple old versions of ceph after run `ceph version`, we see three ceph versions in {16.2.*} , these daemons are ceph osd. our question is: how to stop this version check , we cannot upgrade all old daemon. Thanks, Xiong

6 months, 1 week

2
1
0 0

pool(s) do not have an application enabled after upgrade ti 17.2.7

by Dmitry Melekhov

Hello! I'm very new to ceph ,s orry I'm asking extremely basic questions. I just upgraded 17.2.6 to 17.2.7 and got warning: 2 pool(s) do not have an application enabled These pools are 5 cephfs.cephfs.meta 6 cephfs.cephfs.data I don't remember why and how I created them, I just followed some instruction... And don't remember their state before upgrade :-( And I see in dashboard 0 bytes is used in both pools. But I have two other pools 3 cephfs_data 4 cephfs_metadata which are in use by cephfs: ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ] and really have data in them. Could you tell me, can I just remove these two pools without application, if everything works , i.e. cephfs is mounted and accessible? Thank you!

6 months, 1 week

2
6
0 0

OSD fails to start after 17.2.6 to 17.2.7 update

by Matthew Booth

Firstly I'm rolling out a rook update from v1.12.2 to v1.12.7 (latest stable) and ceph from 17.2.6 to 17.2.7 at the same time. I mention this in case the problem is actually caused by rook rather than ceph. It looks like ceph to my uninitiated eyes, though. The update just started bumping my OSDs and the first one fails in the 'activate' init container. The complete logs for this container are: + OSD_ID=5 + CEPH_FSID=<redacted> + OSD_UUID=<redacted> + OSD_STORE_FLAG=--bluestore + OSD_DATA_DIR=/var/lib/ceph/osd/ceph-5 + CV_MODE=raw + DEVICE=/dev/sdc + cp --no-preserve=mode /etc/temp-ceph/ceph.conf /etc/ceph/ceph.conf + python3 -c ' import configparser config = configparser.ConfigParser() config.read('\''/etc/ceph/ceph.conf'\'') if not config.has_section('\''global'\''): config['\''global'\''] = {} if not config.has_option('\''global'\'','\''fsid'\''): config['\''global'\'']['\''fsid'\''] = '\''<redacted>'\'' with open('\''/etc/ceph/ceph.conf'\'', '\''w'\'') as configfile: config.write(configfile) ' + ceph -n client.admin auth get-or-create osd.5 mon 'allow profile osd' mgr 'allow profile osd' osd 'allow *' -k /etc/ceph/admin-keyring-store/keyring [osd.5] key = <redacted> + [[ raw == \l\v\m ]] ++ mktemp + OSD_LIST=/tmp/tmp.CekJVsr9gr + ceph-volume raw list /dev/sdc Traceback (most recent call last): File "/usr/sbin/ceph-volume", line 11, in <module> load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')() File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in __init__ self.main(self.argv) File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc return f(*a, **kw) File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main terminal.dispatch(self.mapper, subcommand_args) File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch instance.main() File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/main.py", line 32, in main terminal.dispatch(self.mapper, self.argv) File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch instance.main() File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/list.py", line 166, in main self.list(args) File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/list.py", line 122, in list report = self.generate(args.device) File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/list.py", line 91, in generate info_device = [info for info in info_devices if info['NAME'] == dev][0] IndexError: list index out of range So it has failed executing `ceph-volume raw list /dev/sdc`. It looks like this code is new in 17.2.7. Is this a regression? What would be the simplest way to back out of it? Thanks, Matt -- Matthew Booth

6 months, 2 weeks

1
3
0 0

Ceph dashboard reports CephNodeNetworkPacketErrors

by Dominique Ramaekers

Hi, I'm using Ceph on a 4-host cluster for a year now. I recently discovered the Ceph Dashboard :-) No I see that the Dashboard reports CephNodeNetworkPacketErrors >0.01% or >10 packets/s... Although all systems work great, I'm worried. 'ip -s link show eno5' results: 2: eno5: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 7a:3b:79:9c:f6:d1 brd ff:ff:ff:ff:ff:ff permaddr 5c:ba:2c:08:b3:90 RX: bytes packets errors dropped missed mcast 734153938129 645770129 20160 0 0 342301 TX: bytes packets errors dropped carrier collsns 1085134190597 923843839 0 0 0 0 altname enp178s0f0 So in average 0,0003% of RX packet errors! All the four hosts uses the same 10Gb HP switch. The hosts themselves are HP Proliant G10 servers. I would expect 0% packet loss... Anyway. Should I be worried about data consistency? Or can Ceph handle this amount of packet errors? Greetings, Dominique.

6 months, 2 weeks

2
2
0 0

Many pgs inactive after node failure

by Matthew Booth

I have a 3 node ceph cluster in my home lab. One of the pools spans 3 hdds, one on each node, and has size 2, min size 1. One of my nodes is currently down, and I have 160 pgs in 'unknown' state. The other 2 hosts are up and the cluster has quorum. Example `ceph health detail` output: pg 9.0 is stuck inactive for 25h, current state unknown, last acting [] I have 3 questions: Why would the pgs be in an unknown state? I would like to recover the cluster without recovering the failed node, primarily so that I know I can. Is that possible? The boot nvme of the host has failed, so I will most likely rebuild it. I'm running rook, and I will most likely delete the old node and create a new one with the same name. AFAIK, the OSDs are fine. When rook rediscovers the OSDs, will it add them back with data intact? If not, is there any way I can make it so it will? Thanks! -- Matthew Booth

6 months, 2 weeks

3
3
0 0

Nautilus: Decommission an OSD Node

by Dave Hall

Hello., I've recently made the decision to gradually decommission my Nautilus cluster and migrate the hardware to a new Pacific or Quincy cluster. By gradually, I mean that as I expand the new cluster I will move (copy/erase) content from the old cluster to the new, making room to decommission more nodes and move them over. In order to do this I will, of course, need to remove OSD nodes by first emptying the OSDs on each node. I noticed that pgremapper (a version prior to October 2021) has a 'drain' subcommand that allows one to control which target OSDs would receive the PGs from the source OSD being drained. This seemed like a good idea: If one simply marks an OSD 'out', it's contents would be rebalanced to other OSDs on the same node that are still active, which seems like it would make a lot of unnecessary data movement and also make removing the next OSD take longer. So I went through the trouble of creating a 'really long' pgremapper drain command excluding the OSDs of two nodes as targets: # bin/pgremapper drain 16 --target-osds 00,01,02,03,04,05,06,07,24,25,16,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71 --allow-movement-across host --max-source-backfills 75 --concurrency 20 --verbose --yes However, when this is complete OSD 16 actually contains more PGs than before I started. It appears that the mapping generated by pgremapper also back-filled the OSD as it was draining it. So did I miss something here? What is the best way to proceed? I understand that it would be mayhem to mark 8 of 72 OSDs out and then turn backfill/rebalance/recover back on. But it seems like there should be a better way. Suggestions? Thanks. -Dave -- Dave Hall Binghamton University kdhall(a)binghamton.edu

6 months, 2 weeks

2
1
0 0

OSD not starting

by Amudhan P

Hi, One of the server in Ceph cluster accidentally shutdown abruptly due to power failure. After restarting OSD's not coming up and in Ceph health check it shows osd down. When checking OSD status "osd.26 18865 unable to obtain rotating service keys; retrying" For every 30 seconds it's just putting a message and it's all the same in all OSD in the system. Nov 04 20:03:05 strg-node-03 bash[34287]: debug 2023-11-04T14:33:05.089+0000 7f1f5693c080 -1 osd.26 18865 unable to obtain rotating service keys; retrying Nov 04 20:03:35 strg-node-03 bash[34287]: debug 2023-11-04T14:33:35.090+0000 7f1f5693c080 -1 osd.26 18865 unable to obtain rotating service keys; retrying ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable) on Debian 11 bullseye and cephadm based installation. Tried to search for errors and msg couldn't find anything useful. How do I fix this issue ? regards, Amudhan

6 months, 2 weeks

2
2
0 0

RGW: Quincy 17.2.7 and rgw_crypt_default_encryption_key

by Jayanth Reddy

Hello Users, It is great to hear a note about RGW "S3 multipart uploads using Server-Side Encryption now replicate correctly in multi-site" in Quincy v17.2.7 release. But I see that users who are using [1] still have a dependency on the item tracked at [2]. I tested with Reef 18.2.0 as well and the PR [3] seems to be merged into reef but the configuration is not being effective. We're currently stuck at version 17.2.3 and in a situation where we can not upgrade to later because "rgw_crypt_default_encryption_key" is still WIP and also MPU with SSE (default key) is only fixed in 17.2.7 :( [1] https://docs.ceph.com/en/quincy/radosgw/encryption/#automatic-encryption-fo… [2] https://tracker.ceph.com/issues/61473 [3] https://github.com/ceph/ceph/pull/52796 Appreciate any help. Regards, Jayanth

6 months, 2 weeks

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users November 2023