July 2023 - ceph-users - lists.ceph.io

ceph-dashboard python warning with new pyo3 0.17 lib (debian12)

by DERUMIER, Alexandre

Hi, on debian12, ceph-dashboard is throwing a warning "Module 'dashboard' has failed dependency: PyO3 modules may only be initialized once per interpreter process" Seem to be related to pyo3 0.17 change https://github.com/PyO3/pyo3/blob/7bdc504252a2f972ba3490c44249b202a4ce6180/… " Each #[pymodule] can now only be initialized once per process To make PyO3 modules sound in the presence of Python sub-interpreters, for now it has been necessary to explicitly disable the ability to initialize a #[pymodule] more than once in the same process. Attempting to do this will now raise an ImportError. "

7 months

3
3
0 0

libceph: mds1 IP+PORT wrong peer at address

by Frank Schilder

Hi all, we seem to have hit a bug in the ceph fs kernel client and I just want to confirm what action to take. We get the error "wrong peer at address" in dmesg and some jobs on that server seem to get stuck in fs access; log extract below. I found these 2 tracker items https://tracker.ceph.com/issues/23883 and https://tracker.ceph.com/issues/41519, which don't seem to have fixes. My questions: - Is this harmless or does it indicate invalid/corrupted client cache entries? - How to resolve, ignore, umount+mount or reboot? Here an extract from the dmesg log, the error has survived a couple of MDS restarts already: [Mon Mar 6 12:56:46 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:05:18 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-1572619386 [Mon Mar 6 13:05:18 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:13:50 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-1572619386 [Mon Mar 6 13:13:50 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:16:41 2023] libceph: mds1 192.168.32.87:6801 socket closed (con state OPEN) [Mon Mar 6 13:16:41 2023] libceph: mds1 192.168.32.87:6801 socket closed (con state OPEN) [Mon Mar 6 13:16:45 2023] ceph: mds1 reconnect start [Mon Mar 6 13:16:45 2023] ceph: mds1 reconnect start [Mon Mar 6 13:16:48 2023] ceph: mds1 reconnect success [Mon Mar 6 13:16:48 2023] ceph: mds1 reconnect success [Mon Mar 6 13:18:13 2023] ceph: update_snap_trace error -22 [Mon Mar 6 13:18:17 2023] libceph: mds7 192.168.32.88:6801 socket closed (con state OPEN) [Mon Mar 6 13:18:17 2023] libceph: mds7 192.168.32.88:6801 socket closed (con state OPEN) [Mon Mar 6 13:18:23 2023] ceph: mds1 recovery completed [Mon Mar 6 13:18:23 2023] ceph: mds1 recovery completed [Mon Mar 6 13:18:28 2023] ceph: mds7 reconnect start [Mon Mar 6 13:18:28 2023] ceph: mds7 reconnect start [Mon Mar 6 13:18:28 2023] ceph: mds7 reconnect success [Mon Mar 6 13:18:29 2023] ceph: mds7 reconnect success [Mon Mar 6 13:18:35 2023] ceph: update_snap_trace error -22 [Mon Mar 6 13:18:35 2023] ceph: mds7 recovery completed [Mon Mar 6 13:18:35 2023] ceph: mds7 recovery completed [Mon Mar 6 13:22:22 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Mon Mar 6 13:22:22 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:30:54 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [...] [Thu Mar 9 09:37:24 2023] slurm.epilog.cl (31457): drop_caches: 3 [Thu Mar 9 09:38:26 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 09:38:26 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Thu Mar 9 09:46:58 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 09:46:58 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Thu Mar 9 09:55:30 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 09:55:30 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Thu Mar 9 10:04:02 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 10:04:02 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

7 months, 3 weeks

4
6
0 0

Quincy 17.2.6 - Rados gateway crash -

by xadhoom76＠gmail.com

Hi, we have service that is still crashing when S3 client (veeam backup) start to write data main log from rgw service req 13170422438428971730 0.008000086s s3:get_obj WARNING: couldn't find acl header for object, generating default 2023-07-20T14:36:45.331+0000 7fa5adb4c700 -1 *** Caught signal (Aborted) ** And " 2023-07-19T22:04:15.968+0000 7ff07305b700 1 beast: 0x7fefc7178710: 172.16.199.11 - veeam90 [19/Jul/2023:22:04:15.948 +0000] "PUT /veeam90/Veeam/Backu p/veeam90/Clients/%7Bd14cd688-57b4-4809-a1d9-14cafd191b11%7D/34387bbd-bec9-4a40-a04d-6a890d5d6407/CloudStg/Data/%7Bf687ee0f-fb50-4ded-b3a8-3f67ca7f244 b%7D/%7B6f31c277-734c-46fd-98d5-c560aa6dc776%7D/144113_f3fd31c9ee2a45aeeadda0de3cbc9064_00000000000000000000000000000000 HTTP/1.1" 200 63422 - "APN/1. 0 Veeam/1.0 Backup/12.0" - latency=0.020000216s 2023-07-19T22:04:15.972+0000 7ff08307b700 1 ====== starting new request req=0x7fefc7682710 ===== 2023-07-19T22:04:15.972+0000 7ff087083700 1 ====== starting new request req=0x7fefc737c710 ===== 2023-07-19T22:04:15.972+0000 7ff071057700 1 ====== starting new request req=0x7fefc72fb710 ===== 2023-07-19T22:04:15.972+0000 7ff0998a8700 1 ====== starting new request req=0x7fefc71f9710 ===== 2023-07-19T22:04:15.972+0000 7fefe473e700 -1 *** Caught signal (Aborted) ** in thread 7fefe473e700 thread_name:radosgw ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) 1: /lib64/libpthread.so.0(+0x12cf0) [0x7ff102d62cf0] 2: gsignal() 3: abort() 4: /lib64/libstdc++.so.6(+0x9009b) [0x7ff101d5209b] 5: /lib64/libstdc++.so.6(+0x9653c) [0x7ff101d5853c] 6: /lib64/libstdc++.so.6(+0x95559) [0x7ff101d57559] 7: __gxx_personality_v0() 8: /lib64/libgcc_s.so.1(+0x10b03) [0x7ff101736b03] 9: _Unwind_Resume() 10: /lib64/libradosgw.so.2(+0x538c5b) [0x7ff105246c5b] ---------------------- -- -10> 2023-07-19T22:04:15.972+0000 7ff071057700 2 req 8167590275148061076 0.000000000s s3:put_obj pre-executing -9> 2023-07-19T22:04:15.972+0000 7ff071057700 2 req 8167590275148061076 0.000000000s s3:put_obj check rate limiting -8> 2023-07-19T22:04:15.972+0000 7ff071057700 2 req 8167590275148061076 0.000000000s s3:put_obj executing -7> 2023-07-19T22:04:15.972+0000 7ff0998a8700 1 ====== starting new request req=0x7fefc71f9710 ===== -6> 2023-07-19T22:04:15.972+0000 7ff0998a8700 2 req 15658207768827051601 0.000000000s initializing for trans_id = tx00000d94d21014832be51-0064b85 ddf-3dfe-backup -5> 2023-07-19T22:04:15.972+0000 7ff0998a8700 2 req 15658207768827051601 0.000000000s getting op 1 -4> 2023-07-19T22:04:15.972+0000 7ff0998a8700 2 req 15658207768827051601 0.000000000s s3:put_obj verifying requester -3> 2023-07-19T22:04:15.972+0000 7ff0998a8700 2 req 15658207768827051601 0.000000000s s3:put_obj normalizing buckets and tenants -2> 2023-07-19T22:04:15.972+0000 7ff0998a8700 2 req 15658207768827051601 0.000000000s s3:put_obj init permissions -1> 2023-07-19T22:04:15.972+0000 7ff011798700 2 req 15261257039771290446 0.024000257s s3:put_obj completing 0> 2023-07-19T22:04:15.972+0000 7fefe473e700 -1 *** Caught signal (Aborted) ** in thread 7fefe473e700 thread_name:radosgw " Anyone have this issue ? Thanks

7 months, 3 weeks

5
5
0 0

RGW multisite logs (data, md, bilog) not being trimmed automatically?

by Christian Rohmann

Hey ceph-users, I am running two (now) Quincy clusters doing RGW multi-site replication with only one actually being written to by clients. The other site is intended simply as a remote copy. On the primary cluster I am observing an ever growing (objects and bytes) "sitea.rgw.log" pool, not so on the remote "siteb.rgw.log" which is only 300MiB and around 15k objects with no growth. Metrics show that the growth of pool on primary is linear for at least 6 months, so not sudden spikes or anything. Also sync status appears to be totally happy. There are also no warnings in regards to large OMAPs or anything similar. I was under the impression that RGW will trim its three logs (md, bi, data) automatically and only keep data that has not yet been replicated by the other zonegroup members? The config option "ceph config get mgr rgw_sync_log_trim_interval" is set to 1200, so 20 Minutes. So I am wondering if there might be some inconsistency or how I can best analyze what the cause for the accumulation of log data is? There are older questions on the ML, such as [1], but there was not really a solution or root cause identified. I know there is manual trimming, but I'd rather want to analyze the current situation and figure out what the cause for the lack of auto-trimming is. * Do I need to go through all buckets and count logs and look at their timestamps? Which queries do make sense here? * Is there usually any logging of the log trimming activity that I should expect? Or that might indicate why trimming does not happen? Regards Christian [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/WZCFOAMLWV…

7 months, 3 weeks

1
2
0 0

Critical Information: DELL/Toshiba SSDs dying after 70,000 hours of operation

by Frédéric Nass

Hello, This message does not concern Ceph itself but a hardware vulnerability which can lead to permanent loss of data on a Ceph cluster equipped with the same hardware in separate fault domains. The DELL / Toshiba PX02SMF020, PX02SMF040, PX02SMF080 and PX02SMB160 SSD drives of the 13G generation of DELL servers are subject to a vulnerability which renders them unusable after 70,000 hours of operation, i.e. approximately 7 years and 11 months of activity. This topic has been discussed here: https://www.dell.com/community/PowerVault/TOSHIBA-PX02SMF080-has-lost-commu… The risk is all the greater since these disks may die at the same time in the same server leading to the loss of all data in the server. To date, DELL has not provided any firmware fixing this vulnerability, the latest firmware version being "A3B3" released on Sept. 12, 2016: https://www.dell.com/support/home/en-us/ drivers/driversdetails?driverid=hhd9k If your have servers running these drives, check their uptime. If they are close to the 70,000 hour limit, replace them immediately. The smartctl tool does not report the uptime for these SSDs, but if you have HDDs in the server, you can query their SMART status and get their uptime, which should be about the same as the SSDs. The smartctl command is: smartctl -a -d megaraid,XX /dev/sdc (where XX is the iSCSI bus number). We have informed DELL about this but have no information yet on the arrival of a fix. We have lost 6 disks, in 3 different servers, in the last few weeks. Our observation shows that the drives don't survive full shutdown and restart of the machine (power off then power on in iDrac), but they may also die during a single reboot (init 6) or even while the machine is running. Fujitsu released a corrective firmware in June 2021 but this firmware is most certainly not applicable to DELL drives: https://www.fujitsu.com/us/imagesgig5/PY-CIB070-00.pdf Regards, Frederic Sous-direction Infrastructure and Services Direction du Numérique Université de Lorraine

8 months, 1 week

2
2
1 0

CephFS metadata outgrow DISASTER during recovery

by Jakub Petrzilka

Hello everyone! Recently we had a very nasty incident with one of our CEPH storages. During basic backfill recovery operation due to faulty disk CephFS metadata started growing exponentially until they used all available space and whole cluster DIED. Usage graph screenshot in attachment. Everything was very fast and even when the OSDs were marked full they tripped failsafe and ate all the free blocks, still trying to allocate space and completely died without possibility to even start them again. Only solution was to copy whole bluestore to bigger SSD and resize underlying BS device. Just about 1/3 was able to start after moving but it was enough since we have very redundant settings for cephfs metadata. Basically metadata were moved from 12x 240g SSD to 12x 500GB SSD to have enough space to start again. Brief info about the cluster: - CephFS data are stored on ~500x 8TB SAS HDD using 10+2 ECC in 18 hosts. - CephFS metadata are stored on ~12x 500GB SAS/SATA SSD using 5x replication on 6 hosts. - Version was one of the latest 16.x.x Pacific at the time of the incident. - 3x Mon+mgr and 2 active and 2 hot standby MDS are on separate virtual servers. - typical file size to be stored is from hundreds of MBs to tens of GBs. - this cluster is not the biggest, not having the most HDDs, no special config, I simply see nothing special about this cluster. During investigation I found out the following: - Metadata are outgrowing any time recovery is running on any of maintained clusters (~15 clusters of different usages and sizes) but not this much, this was an extreme situation. - after recovery finish size went fine again. - i think there is slight correlation with recovery width (objects to be touched by recovery in order to recovery everything) and recovery (time) length. But i have no proof. - nothing much else I would like to find out why this happened because i think this can happen again sometime and someone might lose some data if they have less luck. Any ideas are appreciated, or even info if anyone have seen any similar behavior or if i am the only one struggling with issue like this :) Kind regards, Jakub Petrzilka

8 months, 3 weeks

3
5
0 0

MDS nodes blocklisted

by Nathan Harper

Hi, We're having sporadic problems with a CephFS filesystem where MDSs end up on the OSD blocklist. We're still digging around looking for a cause (Ceph related or other infrastructure cause). The cluster isn't massive (68 OSDs spread over 34 hosts), each host is a VM, with MGR/MON/MDS on non-OSD hosts. Running Ceph 16.2.10 Any suggestions for debugging this further?

9 months, 1 week

3
2
0 0

Luminous Bluestore issues and RGW Multi-site Recovery

by Gregory O'Neill

Hello, I have two main questions here. 1. What can I do when `ceph-bluestore-tool` outputs a stack trace for `fsck`? 2. How does one recover from lost PGs / data corruption in an RGW Multi-site setup? --- I have a Luminous 12.2.12 cluster built on ceph/daemon:v3.2.10-stable-3.2-luminous-centos-7-x86_64 for all daemons, no ceph packages are installed on the systems. The OSD nodes have 128GB RAM, 6 SATA SSDs (Micron 5200, 2TB) and 1 NVMe SSD split into 4 OSDs. osd_memory_target is set to 10GB and the OSD nodes have 128GB of RAM. That should put me at 100/128GB used. There are 3 PGs down, 3 of the OSDs that had those PGs won't stay online, and they crash fairly quickly after starting. These are running on SATA SSDs which are being replaced with NVMe SSDs. Crush re-weighting the SATA drives down causes some SATA OSDs to crash and some NVMe drives have slow or blocked ops (related to the down PGs). I installed the ceph-osd package on one OSD host. When I ran `ceph-bluestore-tool`, I got a bunch of tcmalloc and unexpected aio errors. Exact output below. I also tried `ceph-objectstore-tool` but received similar results. I cloned the other OSD that has the affected PGs to have a copy I can work on, but I got the exact same results as before. --- From what I can see, this is likely due to bad drives and automation trying to restart down OSDs several times. With 3 down PGs, I am assuming my next step would be to mark those PGs lost. From there, I am unsure what the recovery procedure is to sync "clean" data from other zones into the cluster that was impacted. Is RGW able to handle this? Do I need to use `rclone`? --- $ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-11 fsck tcmalloc: large alloc 1283989504 bytes == 0x557fdbe46000 @ 0x7fc87e4126d0 0x7fc873354ae9 0x7fc873356073 0x557f89d3d680 0x557f89d2ebcd 0x557f89d30524 0x557f89d318ef 0x557f89d33147 0x557f89bb0d6f 0x557f89b3c91b 0x557f89b6df8a 0x557f89a2c5e1 0x7fc87299d2e1 0x557f89ab03fa (nil) tcmalloc: large alloc 2567970816 bytes == 0x5580286c8000 @ 0x7fc87e4126d0 0x7fc873354ae9 0x7fc873356073 0x557f89d3d680 0x557f89d2ebcd 0x557f89d30524 0x557f89d318ef 0x557f89d33147 0x557f89bb0d6f 0x557f89b3c91b 0x557f89b6df8a 0x557f89a2c5e1 0x7fc87299d2e1 0x557f89ab03fa (nil) tcmalloc: large alloc 5135933440 bytes == 0x5580c17ca000 @ 0x7fc87e4126d0 0x7fc873354ae9 0x7fc873356073 0x557f89d3d680 0x557f89d2ebcd 0x557f89d30524 0x557f89d318ef 0x557f89d33147 0x557f89bb0d6f 0x557f89b3c91b 0x557f89b6df8a 0x557f89a2c5e1 0x7fc87299d2e1 0x557f89ab03fa (nil) tcmalloc: large alloc 3025510400 bytes == 0x557f8f6e6000 @ 0x7fc87e4126d0 0x7fc873354ae9 0x7fc87335582b 0x557f89d75d19 0x557f89d2edda 0x557f89d30524 0x557f89d318ef 0x557f89d33147 0x557f89bb0d6f 0x557f89b3c91b 0x557f89b6df8a 0x557f89a2c5e1 0x7fc87299d2e1 0x557f89ab03fa (nil) tcmalloc: large alloc 2269913088 bytes == 0x55832469e000 @ 0x7fc87e3f2e50 0x7fc87e4121b9 0x7fc8756ca4f7 0x7fc8756cd304 0x557f89cc4661 0x557f89ad0858 0x557f89ad2224 0x557f89cb7b1d 0x557f89de584c 0x557f89de6a7e 0x557f89e05e7b 0x557f89d2cf48 0x557f89d2efd2 0x557f89d30524 0x557f89d318ef 0x557f89d33147 0x557f89bb0d6f 0x557f89b3c91b 0x557f89b6df8a 0x557f89a2c5e1 0x7fc87299d2e1 0x557f89ab03fa (nil) 2023-07-30 08:27:27.531919 7fc86f689700 -1 bdev(0x557f8add4240 /var/lib/ceph/osd/ceph-11/block) aio to 929504952320~2269908992 but returned: 2147479552/build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7fc86f689700 time 2023-07-30 08:27:27.532004 /build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc: 397: FAILED assert(0 == "unexpected aio error") ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fc8757242c2] 2: (KernelDevice::_aio_thread()+0x1377) [0x557f89cc14c7] 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x557f89cc725d] 4: (()+0x74a4) [0x7fc8740104a4] 5: (clone()+0x3f) [0x7fc872a65d0f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2023-07-30 08:27:27.544215 7fc86f689700 -1 /build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7fc86f689700 time 2023-07-30 08:27:27.532004 /build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc: 397: FAILED assert(0 == "unexpected aio error") ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fc8757242c2] 2: (KernelDevice::_aio_thread()+0x1377) [0x557f89cc14c7] 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x557f89cc725d] 4: (()+0x74a4) [0x7fc8740104a4] 5: (clone()+0x3f) [0x7fc872a65d0f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -1> 2023-07-30 08:27:27.531919 7fc86f689700 -1 bdev(0x557f8add4240 /var/lib/ceph/osd/ceph-11/block) aio to 929504952320~2269908992 but returned: 2147479552 0> 2023-07-30 08:27:27.544215 7fc86f689700 -1 /build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7fc86f689700 time 2023-07-30 08:27:27.532004 /build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc: 397: FAILED assert(0 == "unexpected aio error") ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fc8757242c2] 2: (KernelDevice::_aio_thread()+0x1377) [0x557f89cc14c7] 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x557f89cc725d] 4: (()+0x74a4) [0x7fc8740104a4] 5: (clone()+0x3f) [0x7fc872a65d0f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. *** Caught signal (Aborted) ** in thread 7fc86f689700 thread_name:bstore_aio ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (()+0x424fc4) [0x557f89d25fc4] 2: (()+0x110e0) [0x7fc87401a0e0] 3: (gsignal()+0xcf) [0x7fc8729affff] 4: (abort()+0x16a) [0x7fc8729b142a] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x7fc87572444e] 6: (KernelDevice::_aio_thread()+0x1377) [0x557f89cc14c7] 7: (KernelDevice::AioCompletionThread::entry()+0xd) [0x557f89cc725d] 8: (()+0x74a4) [0x7fc8740104a4] 9: (clone()+0x3f) [0x7fc872a65d0f] 2023-07-30 08:27:27.549175 7fc86f689700 -1 *** Caught signal (Aborted) ** in thread 7fc86f689700 thread_name:bstore_aio ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (()+0x424fc4) [0x557f89d25fc4] 2: (()+0x110e0) [0x7fc87401a0e0] 3: (gsignal()+0xcf) [0x7fc8729affff] 4: (abort()+0x16a) [0x7fc8729b142a] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x7fc87572444e] 6: (KernelDevice::_aio_thread()+0x1377) [0x557f89cc14c7] 7: (KernelDevice::AioCompletionThread::entry()+0xd) [0x557f89cc725d] 8: (()+0x74a4) [0x7fc8740104a4] 9: (clone()+0x3f) [0x7fc872a65d0f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 0> 2023-07-30 08:27:27.549175 7fc86f689700 -1 *** Caught signal (Aborted) ** in thread 7fc86f689700 thread_name:bstore_aio ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (()+0x424fc4) [0x557f89d25fc4] 2: (()+0x110e0) [0x7fc87401a0e0] 3: (gsignal()+0xcf) [0x7fc8729affff] 4: (abort()+0x16a) [0x7fc8729b142a] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x7fc87572444e] 6: (KernelDevice::_aio_thread()+0x1377) [0x557f89cc14c7] 7: (KernelDevice::AioCompletionThread::entry()+0xd) [0x557f89cc725d] 8: (()+0x74a4) [0x7fc8740104a4] 9: (clone()+0x3f) [0x7fc872a65d0f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Aborted $ ceph-objectstore-tool --data-path=/var/lib/ceph/osd/ceph-11 --op list-pgs tcmalloc: large alloc 1283989504 bytes == 0x5649b1bdc000 @ 0x7f3af5e756d0 0x7f3aeafbbae9 0x7f3aeafbd073 0x56495defb9e0 0x56495deed01d 0x56495deee974 0x56495deefd3f 0x56495def1597 0x56495de0e47f 0x56495dd95dab 0x56495ddcf9e4 0x56495d7de4db 0x7f3aea6042e1 0x56495d86853a (nil) tcmalloc: large alloc 2567970816 bytes == 0x5649fe45e000 @ 0x7f3af5e756d0 0x7f3aeafbbae9 0x7f3aeafbd073 0x56495defb9e0 0x56495deed01d 0x56495deee974 0x56495deefd3f 0x56495def1597 0x56495de0e47f 0x56495dd95dab 0x56495ddcf9e4 0x56495d7de4db 0x7f3aea6042e1 0x56495d86853a (nil) tcmalloc: large alloc 5135933440 bytes == 0x564a97560000 @ 0x7f3af5e756d0 0x7f3aeafbbae9 0x7f3aeafbd073 0x56495defb9e0 0x56495deed01d 0x56495deee974 0x56495deefd3f 0x56495def1597 0x56495de0e47f 0x56495dd95dab 0x56495ddcf9e4 0x56495d7de4db 0x7f3aea6042e1 0x56495d86853a (nil) tcmalloc: large alloc 3025510400 bytes == 0x56496547c000 @ 0x7f3af5e756d0 0x7f3aeafbbae9 0x7f3aeafbc82b 0x56495df34079 0x56495deed22a 0x56495deee974 0x56495deefd3f 0x56495def1597 0x56495de0e47f 0x56495dd95dab 0x56495ddcf9e4 0x56495d7de4db 0x7f3aea6042e1 0x56495d86853a (nil) tcmalloc: large alloc 2269913088 bytes == 0x564cfa402000 @ 0x7f3af5e55e50 0x7f3af5e751b9 0x7f3aed12d4f7 0x7f3aed130304 0x56495de9fbc1 0x56495de7a5f8 0x56495de7bfc4 0x56495de9307d 0x56495dfa32dc 0x56495dfa450e 0x56495dfc34db 0x56495deeb398 0x56495deed422 0x56495deee974 0x56495deefd3f 0x56495def1597 0x56495de0e47f 0x56495dd95dab 0x56495ddcf9e4 0x56495d7de4db 0x7f3aea6042e1 0x56495d86853a (nil) /build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f3ae72f0700 time 2023-07-30 08:37:16.531432 /build/ceph-12.2.12/src/os/bluestore/KernelDevice.cc: 397: FAILED assert(0 == "unexpected aio error") ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f3aed1872c2] 2: (KernelDevice::_aio_thread()+0x1377) [0x56495de9ca27] 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x56495dea27bd] 4: (()+0x74a4) [0x7f3aeba734a4] 5: (clone()+0x3f) [0x7f3aea6ccd0f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. *** Caught signal (Aborted) ** in thread 7f3ae72f0700 thread_name:bstore_aio ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (()+0x94a0f4) [0x56495debe0f4] 2: (()+0x110e0) [0x7f3aeba7d0e0] 3: (gsignal()+0xcf) [0x7f3aea616fff] 4: (abort()+0x16a) [0x7f3aea61842a] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x7f3aed18744e] 6: (KernelDevice::_aio_thread()+0x1377) [0x56495de9ca27] 7: (KernelDevice::AioCompletionThread::entry()+0xd) [0x56495dea27bd] 8: (()+0x74a4) [0x7f3aeba734a4] 9: (clone()+0x3f) [0x7f3aea6ccd0f] Aborted -- Gregory O’Neill

9 months, 1 week

3
3
0 0

ref v18.2.0 QE Validation status

by Yuri Weinstein

Details of this release are summarized here: https://tracker.ceph.com/issues/62231#note-1 Seeking approvals/reviews for: smoke - Laura, Radek rados - Neha, Radek, Travis, Ernesto, Adam King rgw - Casey fs - Venky orch - Adam King rbd - Ilya krbd - Ilya upgrade-clients:client-upgrade* - in progress powercycle - Brad Please reply to this email with approval and/or trackers of known issues/PRs to address them. bookworm distro support is an outstanding issue. TIA YuriW

9 months, 1 week

11
19
0 0

Not all Bucket Shards being used

by Christian Kugler

Hi, I have trouble with large OMAP files in a cluster in the RGW index pool. Some background information about the cluster: There is CephFS and RBD usage on the main cluster but for this issue I think only S3 is interesting. There is one realm, one zonegroup with two zones which have a bidirectional sync set up. Since this does not allow for autoresharding we have to do it by hand in this cluster – looking forward to Reef! From the logs: cluster 2023-07-17T22:59:03.018722+0000 osd.75 (osd.75) 623978 : cluster [WRN] Large omap object found. Object: 34:bcec3016:::.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.5:head PG: 34.680c373d (34.5) Key count: 962091 Size (bytes): 277963182 The offending bucket looks like this: # radosgw-admin bucket stats \ | jq '.[] | select(.marker =="3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9") |"\(.num_shards) \(.usage["rgw.main"].num_objects)"' -r 131 9463833 Last week the number of objects was about 12 million. Which is why I reshareded the offending bucket twice, I think. Once to 129 and the second time to 131 because I wanted some leeway (or lieway? scnr, Sage). Unfortunately, even after a week the objects were still to big (the log line above is quite recent), so I looked into it again. # rados -p raum.rgw.buckets.index ls \ |grep .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9 \ |sort -V .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.0 .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.1 .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.2 .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.3 .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.4 .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.5 .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.6 .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.7 .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.8 .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.9 .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9.10 # rados -p raum.rgw.buckets.index ls \ |grep .dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9 \ |sort -V \ |xargs -IOMAP sh -c \ 'rados -p raum.rgw.buckets.index listomapkeys OMAP | wc -l' 1013854 1011007 1012287 1011232 1013565 998262 1012777 1012713 1012230 1010690 997111 Apparently, only 11 shards are in use. This would explain why the "Key usage" (from the log line) is about ten times higher than I would expect. How can I deal with this issue? One thing I could try to fix this would be to reshard to a lower number, but I am not sure if there are any risks associated with "downsharding". After that I could reshard to something like 97. Or I could directly "downshard" to 97. Also, the second zone has a similar problem, but as the error messsage lets me know, this would be a bad idea. Will it just take more time until the sharding is transferred to the seconds zone? Best, Christian Kugler

9 months, 1 week

2
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users July 2023