November 2023 - ceph-users

by P Wagner-Beccard

Hi Mailing-Lister's, I am reaching out for assistance regarding a deployment issue I am facing with Ceph on a 4 node RKE2 cluster. We are attempting to deploy Ceph via the rook helm chart, but we are encountering an issue that apparently seems related to a known bug (https://tracker.ceph.com/issues/61597). During the OSD preparation phase, the deployment consistently fails with an IndexError: list index out of range. The logs indicate a problem occurs when configuring new Disks, specifically using /dev/dm-3 as a metadata device. It's important to note that /dev/dm-3 is an LVM on top of an mdadm RAID, which might or might not be contributing to this issue. (I swear, this setup worked already) Here is a snippet of the error from the deployment logs: > 2023-11-23 23:11:30.196913 D | exec: IndexError: list index out of range > 2023-11-23 23:11:30.236962 C | rookcmd: failed to configure devices: failed to initialize osd: failed ceph-volume report: exit status 1 https://paste.openstack.org/show/bileqRFKbolrBlTqszmC/ We have attempted different configurations, including specifying devices explicitly and using the useAllDevices: true option with a specified metadata device (/dev/dm-3 or the /dev/pv_md0/lv_md0 path). However, the issue persists across multiple configurations. tested configurations are as follows: Explicit device specification: ```yaml nodes: - name: "ceph01.maas" devices: - name: /dev/dm-1 - name: /dev/dm-2 - name: "sdb" config: metadataDevice: "/dev/dm-3" - name: "sdc" config: metadataDevice: "/dev/dm-3" ``` General device specification with metadata device: ```yaml storage: useAllNodes: true useAllDevices: true config: metadataDevice: /dev/dm-3 ``` I would greatly appreciate any insights or recommendations on how to proceed or work around this issue. Is there a halfway decent way to apply the fix or maybe a workaround that we can apply to successfully deploy Ceph in our environment? Kind regards,

5 months, 3 weeks

2
2
0 0

How to speed up rgw lifecycle

by VÔ VI

Hi community, My ceph cluster is using s3 with three pools and obj/s approximately 4.5k obj/s and the rgw lifecycle delete per pool is only 60-70 objects/s How can I speed up the lc rgw process? 60 70 objects/s is too slow Thanks a lot

5 months, 3 weeks

2
1
0 0

osdmaptool target & deviation calculation

by Robert Hish

Question about the osdmaptool deviation calculations; For instance, ----- osdmaptool omap --upmap output.txt --upmap-pool cephfs_data-rep3 --upmap-max 1000 --upmap-deviation 5 osdmaptool: osdmap file 'omap' writing upmap command output to: output.txt checking for upmap cleanups upmap, max-count 1000, max deviation 5 limiting to pools cephfs_data-rep3 ([30]) pools cephfs_data-rep3 prepared 0/1000 changes Unable to find further optimization, or distribution is already perfect ----- The evaluated pool is all-on-hdd, and the pool was created with PGS > number of hdd OSDs in the cluster. So each hdd OSD is being used at least once by this pool. Is it correct to assume that the osdmaptool is relying on the equations set at ceph-17.2.5/src/osd/OSDMap.cc:5143 5143 // This function calculates the 2 maps osd_deviation and deviation_osd which 5144 // hold the deviation between the current number of PGs which map to an OSD 5145 // and the optimal number. ... # pgs_per_weight # ceph-17.2.5/src/osd/OSDMap.cc:4806 4806 float pgs_per_weight = total_pgs / osd_weight_total; # target # ceph-17.2.5/src/osd/OSDMap.cc:5156 5156 float target = osd_weight.at(oid) * pgs_per_weight; # deviation # ceph-17.2.5/src/osd/OSDMap.cc:5157 5157 float deviation = (float)opgs.size() - target; And so for pgs_per_weight I calculate ceph -f json osd df | jq '[ .nodes[] | select (.device_class == "hdd") .pgs ] | add' divided by ceph -f json osd df | jq '[ .nodes[] | select (.device_class == "hdd") .crush_weight ] | add' (each hdd OSD in this cluster has identical weight) target = osd_weight.at(oid) * pgs_per_weight I calculate deviation for each osd deviation = opgs.size - target where, opgs.size = the number of PGs at an OSD. i.e. The value of $19 for each $1, in `ceph osd df hdd | awk '{ print $1 " " $19 }'` The result is many many OSDs with a deviation well above the upmap_max_deviation which is at default: 5 So I am wondering if I am miscalculating something, or if I'm not aware of further things the osdmaptool is considering when formulating upmap suggestions? -Robert

5 months, 3 weeks

2
1
0 0

How balancer module balance data

by bryansoong21＠gmail.com

Hello, We are running a pacific 16.2.10 cluster and enabled the balancer module, here is the configuration: [root@ceph-1 ~]# ceph balancer status { "active": true, "last_optimize_duration": "0:00:00.052548", "last_optimize_started": "Fri Nov 17 17:09:57 2023", "mode": "upmap", "optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect", "plans": [] } [root@ceph-1 ~]# ceph balancer eval current cluster score 0.017742 (lower is better) Here is the balancer configuration of upmap_max_deviation: # ceph config get mgr mgr/balancer/upmap_max_deviation 5 We have two different types of OSDS, one is 7681G and another is 3840G. When I checked our PG distribution on each type of OSD, I found the PG distribution is not evenly, for the 7681G OSDs, the OSD distribution varies from 136 to 158; while for the 3840G OSDs, it varies from 60 to 83, seems the upmap_max_deviation is almost +/- 10. So I just wondering if this is expected or do I need to change the upmap_max_deviation to a smaller value. Thanks for answering my question.

5 months, 3 weeks

2
1
0 0

OSDs failing to start due to crc32 and osdmap error

by Denis Polom

Hi we have issue to start some OSDs on one node on our Ceph Quincy 17.2.7 cluster. Some OSDs on that node are running fine, but some failing to start. Looks like crc32 checksum error, and failing to get OSD map. I found a some discussions on that but nothing helped. I've also tried to insert current OSD map but that ends with error: # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-888/ --op set-osdmap --file osdmap osdmap (#-1:20684533:::osdmap.931991:0#) does not exist. Log is bellow Any ideas please? Thank you From log file: 2023-11-27T16:01:47.691+0100 7f3f17aa13c0 -1 Falling back to public interface 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 bluestore(/var/lib/ceph/osd/ceph-888) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xb1701b42, expected 0x9ee5ece2, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 osd.888 0 failed to load OSD map for epoch 927580, got 0 bytes /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time 2023-11-27T16:01:51.443522+0100 /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret) ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x561ad07d2624] 2: ceph-osd(+0xc2e836) [0x561ad07d2836] 3: (OSD::init()+0x4026) [0x561ad08e5a86] 4: main() 5: __libc_start_main() 6: _start() *** Caught signal (Aborted) ** in thread 7f3f17aa13c0 thread_name:ceph-osd 2023-11-27T16:01:51.443+0100 7f3f17aa13c0 -1 /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time 2023-11-27T16:01:51.443522+0100 /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret) ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x561ad07d2624] 2: ceph-osd(+0xc2e836) [0x561ad07d2836] 3: (OSD::init()+0x4026) [0x561ad08e5a86] 4: main() 5: __libc_start_main() 6: _start() ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f3f1814b420] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b7) [0x561ad07d268c] 5: ceph-osd(+0xc2e836) [0x561ad07d2836] 6: (OSD::init()+0x4026) [0x561ad08e5a86] 7: main() 8: __libc_start_main() 9: _start() 2023-11-27T16:01:51.447+0100 7f3f17aa13c0 -1 *** Caught signal (Aborted) ** in thread 7f3f17aa13c0 thread_name:ceph-osd ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f3f1814b420] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b7) [0x561ad07d268c] 5: ceph-osd(+0xc2e836) [0x561ad07d2836] 6: (OSD::init()+0x4026) [0x561ad08e5a86] 7: main() 8: __libc_start_main() 9: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -558> 2023-11-27T16:01:47.691+0100 7f3f17aa13c0 -1 Falling back to public interface -5> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 bluestore(/var/lib/ceph/osd/ceph-888) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xb1701b42, expected 0x9ee5ece2, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# -2> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 osd.888 0 failed to load OSD map for epoch 927580, got 0 bytes -1> 2023-11-27T16:01:51.443+0100 7f3f17aa13c0 -1 /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time 2023-11-27T16:01:51.443522+0100 /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret) ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x561ad07d2624] 2: ceph-osd(+0xc2e836) [0x561ad07d2836] 3: (OSD::init()+0x4026) [0x561ad08e5a86] 4: main() 5: __libc_start_main() 6: _start() 0> 2023-11-27T16:01:51.447+0100 7f3f17aa13c0 -1 *** Caught signal (Aborted) ** in thread 7f3f17aa13c0 thread_name:ceph-osd ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f3f1814b420] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b7) [0x561ad07d268c] 5: ceph-osd(+0xc2e836) [0x561ad07d2836] 6: (OSD::init()+0x4026) [0x561ad08e5a86] 7: main() 8: __libc_start_main() 9: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -562> 2023-11-27T16:01:47.691+0100 7f3f17aa13c0 -1 Falling back to public interface -9> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 bluestore(/var/lib/ceph/osd/ceph-888) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xb1701b42, expected 0x9ee5ece2, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# -6> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 osd.888 0 failed to load OSD map for epoch 927580, got 0 bytes -5> 2023-11-27T16:01:51.443+0100 7f3f17aa13c0 -1 /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time 2023-11-27T16:01:51.443522+0100 /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret) ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x561ad07d2624] 2: ceph-osd(+0xc2e836) [0x561ad07d2836] 3: (OSD::init()+0x4026) [0x561ad08e5a86] 4: main() 5: __libc_start_main() 6: _start() -4> 2023-11-27T16:01:51.447+0100 7f3f17aa13c0 -1 *** Caught signal (Aborted) ** in thread 7f3f17aa13c0 thread_name:ceph-osd ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f3f1814b420] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b7) [0x561ad07d268c] 5: ceph-osd(+0xc2e836) [0x561ad07d2836] 6: (OSD::init()+0x4026) [0x561ad08e5a86] 7: main() 8: __libc_start_main() 9: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Aborted

5 months, 3 weeks

3
8
0 0

understand "extent"

by Tony Liu

Hi, The context is RBD on bluestore. I did check extent on Wiki. I see "extent" when talking about snapshot and export/import. For example, when create a snapshot, we mark extents. When there is write to marked extents, we will make a copy. I also know that user data on block device maps to objects. How "extent" and "object" are related? Can I say extent is a set of continuous objects (with default tripe settings)? Thanks! Tony

5 months, 3 weeks

2
1
0 0

About number of osd node can be failed with erasure code 3+2

by tranphong079＠gmail.com

Hi Groups, Recently I was setting up a ceph cluster with 10 nodes 144 osd, and I use S3 for it with pool erasure code EC3+2 on it. I have a question, how many osd nodes can fail with erasure code 3+2 with cluster working normal (read, write)? and can i choose better erasure code ec7+3, 8+2 etc..? With the erasure code algorithm, it only ensures no data loss, but does not guarantee that the cluster operates normally and does not block IO when osd nodes down. Is that right? Thanks to the community.

5 months, 3 weeks

2
1
0 0

Issue with CephFS (mds stuck in clientreplay status) since upgrade to 18.2.0.

by Lo Re Giuseppe

Hi, We have upgraded one ceph cluster from 17.2.7 to 18.2.0. Since then we are having CephFS issues. For example this morning: “”” [root@naret-monitor01 ~]# ceph -s cluster: id: 63334166-d991-11eb-99de-40a6b72108d0 health: HEALTH_WARN 1 filesystem is degraded 3 clients failing to advance oldest client/flush tid 3 MDSs report slow requests 6 pgs not scrubbed in time 29 daemons have recently crashed … “”” The ceph orch, ceph crash and ceph fs status commands were hanging. After a “ceph mgr fail” those commands started to respond. Then I have noticed that there was one mds with most of the slow operations, “”” [WRN] MDS_SLOW_REQUEST: 3 MDSs report slow requests mds.cephfs.naret-monitor01.nuakzo(mds.0): 18 slow requests are blocked > 30 secs mds.cephfs.naret-monitor01.uvevbf(mds.1): 1683 slow requests are blocked > 30 secs mds.cephfs.naret-monitor02.exceuo(mds.2): 1 slow requests are blocked > 30 secs “”” Then I tried to restart it with “”” [root@naret-monitor01 ~]# ceph orch daemon restart mds.cephfs.naret-monitor01.uvevbf Scheduled to restart mds.cephfs.naret-monitor01.uvevbf on host 'naret-monitor01' “”” After the cephfs entered into this situation: “”” [root@naret-monitor01 ~]# ceph fs status cephfs - 198 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.naret-monitor01.nuakzo Reqs: 0 /s 17.2k 16.2k 1892 14.3k 1 active cephfs.naret-monitor02.ztdghf Reqs: 0 /s 28.1k 10.3k 752 6881 2 clientreplay cephfs.naret-monitor02.exceuo 63.0k 6491 541 66 3 active cephfs.naret-monitor03.lqppte Reqs: 0 /s 16.7k 13.4k 8233 990 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 5888M 18.5T cephfs.cephfs.data data 119G 215T cephfs.cephfs.data.e_4_2 data 2289G 3241T cephfs.cephfs.data.e_8_3 data 9997G 470T STANDBY MDS cephfs.naret-monitor03.eflouf cephfs.naret-monitor01.uvevbf MDS version: ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable) “”” The file system is totally unresponsive (we can mount it on client nodes but any operations like a simple ls hangs). During the night we had a lot of mds crashes, I can share the content. Does anybody have an idea on how to tackle this problem? Best, Giuseppe

5 months, 3 weeks

3
3
0 0

Does cephfs ensure close-to-open consistency after enabling lazyio?

by Jianjun Zheng

5 months, 3 weeks

2
1
0 0

import/export with --export-format 2

by Tony Liu

Hi, src-image is 1GB (provisioned size). I did the following 3 tests. 1. rbd export src-image - | rbd import - dst-image 2. rbd export --export-format 2 src-image - | rbd import --export-format 2 - dst-image 3. rbd export --export-format 2 src-image - | rbd import - dst-image With #1 and #2, dst-image size (rbd info) is the same as src-image, which is expected. With #3, dst-image size (rbd info) is close to used size (rbd du), not the provisioned size of src-image. I'm not sure if this image is actually useable when write into it. The questions is that, is #3 not supposed to be used at all? I checked doc, didn't see something like "--export-format 2 has to be used for importing the image which is exported with --export-format 2 option". Any comments? Thanks! Tony

5 months, 3 weeks

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users November 2023