April 2023 - ceph-users - lists.ceph.io

by Gaël THEROND

Hi everyone, quick question regarding radosgw zone data-pool. I’m currently planning to migrate an old data-pool that was created with inappropriate failure-domain to a newly created pool with appropriate failure-domain. If I’m doing something like: radosgw-admin zone modify —rgw-zone default —data-pool <new_pool> Will data from the old pool be migrated to the new one or do I need to do something else to migrate those data out of the old pool? I’ve read a lot of mail archive with peoples willing to do that but I can’t get a clear answer from those archives. I’m running on nautilus release of it ever help. Thanks a lot! PS: This mail is a redo of the old one as I’m not sure the former one worked (missing tags).

1 year

3
4
0 0

Ceph 16.2.12, bluestore cache doesn't seem to be used much

by Zakhar Kirpichenko

Hi, I have a Ceph 16.2.12 cluster with hybrid OSDs (HDD block storage, DB/WAL on NVME). All OSD settings are default except, cache-related settings are as follows: osd.14 dev bluestore_cache_autotune true osd.14 dev bluestore_cache_size_hdd 4294967296 osd.14 dev bluestore_cache_size_ssd 4294967296 osd.14 advanced bluestore_default_buffered_write false osd.14 dev osd_memory_cache_min 2147483648 osd.14 basic osd_memory_target 17179869184 Other settings such as bluestore_cache_kv_ratio, bluestore_cache_meta_ratio, etc. are default. I.e. OSD memory target is set to 16 GB, bluestore cache is set to 4 GB for HDDs and SSDs, minimum cache size is 2 GB. When I dump memory pools of OSDs, bluestore cache doesn't seem to be actively used (https://pastebin.com/EpfFp85C), despite there's plenty of memory and the memory target is 16 GB, memory pools are around 2 GB and the total RSS of the OSD process is ~4.8 GB. There are 66 OSDs in the cluster and the situation is very similar with all of them. The OSDs are being used quite actively for both reads and writes, and I guess they could benefit from using more memory for caching, especially considering that we have lots of RAM available on each host. Is there a way to increase and/or tune OSD cache memory usage? I would appreciate any advice or pointers. Best regards, Zakhar

1 year

1
0
0 0

Object data missing, but metadata is OK (Quincy 17.2.3)

by Jeff Briden

We have a large cluster on Quincy 17.2.3 with a bucket holding 8.9 million small (15~20 MiB) objects. All the objects were multipart uploads from scripts using `aws s3 cp` The data is static (write-once, read-many) with no manual deletions and no new writes for months. We recently found 3 objects in this bucket that cannot be retrieved. The symptom is exactly the same as https://tracker.ceph.com/issues/47866 and https://bugzilla.redhat.com/show_bug.cgi?id=1892644 which were fixed a long time ago. Any form of listing (`aws s3 ls`, radosgw-admin object stat, radoslist, http head request, etc) returns good data, but the objects cannot be retrieved and rados -p ls shows the object data is missing. Any suggestions on how to troubleshoot this further?

1 year

1
0
0 0

How to control omap capacity？

by WeiGuo Ren

I have two osds. these osd are used to rgw index pool. After a lot of stress tests, these two osds were written to 99.90%. The full ratio (95%) did not take effect? I don't know much. Could it be that if the osd of omap is fully stored, it cannot be limited by the full ratio? ALSO I use ceph-bluestore-tool to expand it . Before I add a partition . But i failed, I dont know why. In my cluster every osd have 55GB (db val data in same device), ceph -v is 14.2.5. can anyone give me some idear to fix it?

1 year

2
1
0 0

Bug, pg_upmap_primaries.empty()

by Nguetchouang Ngongang Kevin

Good morning, i found a bug on ceph reef After installing ceph and deploying 9 osds with a cephfs layer. I got this error after many writing and reading operations on the ceph fs i deployed. ```{ "assert_condition": "pg_upmap_primaries.empty()", "assert_file": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.0.0-3593-g1e73409b/rpm/el8/BUILD/ceph-18.0.0-3593-g1e73409b/src/osd/OSDMap.cc", "assert_func": "void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const", "assert_line": 3239, "assert_msg": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.0.0-3593-g1e73409b/rpm/el8/BUILD/ceph-18.0.0-3593-g1e73409b/src/osd/OSDMap.cc: In function 'void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7f86cb8e5700 time 2023-04-26T12:25:12.278025+0000\n/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.0.0-3593-g1e73409b/rpm/el8/BUILD/ceph-18.0.0-3593-g1e73409b/src/osd/OSDMap.cc: 3239: FAILED ceph_assert(pg_upmap_primaries.empty())\n", "assert_thread_name": "msgr-worker-0", "backtrace": [ "/lib64/libpthread.so.0(+0x12cf0) [0x7f86d0d21cf0]", "gsignal()", "abort()", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x55ce1794774b]", "/usr/bin/ceph-osd(+0x6368b7) [0x55ce179478b7]", "(OSDMap::encode(ceph::buffer::v15_2_0::list&, unsigned long) const+0x1229) [0x55ce183e0449]", "(MOSDMap::encode_payload(unsigned long)+0x396) [0x55ce17ae2576]", "(Message::encode(unsigned long, int, bool)+0x2e) [0x55ce1825dbee]", "(ProtocolV1::prepare_send_message(unsigned long, Message*, ceph::buffer::v15_2_0::list&)+0x54) [0x55ce184e5914]", "(ProtocolV1::write_event()+0x511) [0x55ce184f4ce1]", "(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa64) [0x55ce182eb484]", "/usr/bin/ceph-osd(+0xfdf276) [0x55ce182f0276]", "/lib64/libstdc++.so.6(+0xc2b13) [0x7f86d0369b13]", "/lib64/libpthread.so.0(+0x81ca) [0x7f86d0d171ca]", "clone()" ], "ceph_version": "18.0.0-3593-g1e73409b", "crash_id": "2023-04-26T12:25:12.286947Z_55675d7c-7833-4e91-b0eb-6df705104c2e", "entity_name": "osd.0", "os_id": "centos", "os_name": "CentOS Stream", "os_version": "8", "os_version_id": "8", "process_name": "ceph-osd", "stack_sig": "0ffad2c4bc07caf68ff1e124d3911823bc6fa6f5772444754b7f0a998774c8fe", "timestamp": "2023-04-26T12:25:12.286947Z", "utsname_hostname": "node1-link-1", "utsname_machine": "x86_64", "utsname_release": "5.4.0-100-generic", "utsname_sysname": "Linux", "utsname_version": "#113-Ubuntu SMP Thu Feb 3 18:43:29 UTC 2022" } ``` I really don't know what is this error for, Will appreciate any help. Cordially, -- Nguetchouang Ngongang Kevin ENS de Lyon https://perso.ens-lyon.fr/kevin.nguetchouang/

1 year

2
2
0 0

Ceph Leadership Team meeting minutes - 2023 April 26

by Casey Bodley

# ceph windows tests PR check will be made required once regressions are fixed windows build currently depends on gcc11 which limits use of c++20 features. investigating newer gcc or clang toolchain # 16.2.13 release final testing in progress # prometheus metric regressions https://tracker.ceph.com/issues/59505 related to previous discussion on 4/12 about quincy backports integration test coverage needed for ceph-exporter and the mgr module # lab update centos/rhel tests were failing due to problematic mirrorlists fixed in https://github.com/ceph/ceph-cm-ansible/pull/731 more sanity checks in progress at https://github.com/ceph/ceph-cm-ansible/pull/733 # cephalocon feedback dev summit etherpads: https://pad.ceph.com/p/cephalocon-dev-summit-2023 collect more notes here: https://pad.ceph.com/p/cephalocon-2023-brainstorm request for dev-focused longer term discussion could have specific user-focused and dev-focused sessions dense conference, hard to fit everything in 3 days could have longer component updates during conf, with time for questions perhaps 3 days of conf, dev-specific discussions a day before (no cfp, one big room, then option for breakout), user-feedback sessions during the normal con

1 year

1
0
0 0

Re: Move ceph to new addresses and hostnames

by Jan Marek

Hello all, today I moved ceph to HEALTH_OK state :-) 1) I had to restart MGR node, then my old c-osdx hostnames goes definitely away and all of OSDs from old machines are now orchestrated by 'ceph orch' command. 2) I've updated ceph* packages on the osd2 node to version 17.2.6, then I tried 'cephadm adopt' command once more and voila! It works like a charm. I will try to configure OSDs on the node 1 to adopt WAL and DB from prepared LVM... Maybe after upgrade to newer version of CEPH it will be OK? Sincerely Jan Marek -- Ing. Jan Marek University of South Bohemia Academic Computer Centre Phone: +420389032080 http://www.gnu.org/philosophy/no-word-attachments.cs.html

1 year

2
1
0 0

OSD_TOO_MANY_REPAIRS on random OSDs causing clients to hang

by Thomas Hukkelberg

Hi all, Over the last 2 weeks we have experienced several OSD_TOO_MANY_REPAIRS errors that we struggle to handle in a non-intrusive manner. Restarting MDS + hypervisor that accessed the object in question seems to be the only way we can clear the error so we can repair the PG and recover access. Any pointers on how to handle this issue in a more gentle way than rebooting the hypervisor and failing the MDS would be welcome! The problem seems to only affect one specific pool (id 42) that is used for cephfs_data. This pool is our second cephfs data pool in this cluster. The data in the pool is accessible via LXC container via Samba and have the cephfs filesystem bind-mounted from hypervisor. Ceph is recently updated to version 16.2.11 (pacific) -- kernel version is 5.13.19-6-pve on OSD-hosts/samba-containers and 5.19.17-2-pve on MDS-hosts. The following warnings are issued: $ ceph health detail HEALTH_WARN 1 clients failing to respond to capability release; Too many repaired reads on 1 OSDs; Degraded data redundancy: 1/2648430 090 objects degraded (0.000%), 1 pg degraded; 1 slow ops, oldest one blocked for 608 sec, osd.34 has slow ops [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release mds.hk-cephnode-65(mds.0): Client hk-cephnode-56 failing to respond to capability release client_id: 9534859837 [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 1 OSDs osd.34 had 9936 reads repaired [WRN] PG_DEGRADED: Degraded data redundancy: 1/2648430090 objects degraded (0.000%), 1 pg degraded pg 42.e2 is active+recovering+degraded+repair, acting [34,275,284] [WRN] SLOW_OPS: 1 slow ops, oldest one blocked for 608 sec, osd.34 has slow ops The logs for OSD.34 are flooded with these messages: root@hk-cephnode-53:~# tail /var/log/ceph/ceph-osd.34.log 2023-04-26T11:41:00.760+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 missing primary copy of 42:4703efac:::10003d86a99.00000001:head, will try copies on 275,284 2023-04-26T11:41:00.784+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 full-object read crc 0xebd673ed != expected 0xffffffff on 42:4703efac:::10003d86a99.00000001:head 2023-04-26T11:41:00.812+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 full-object read crc 0xebd673ed != expected 0xffffffff on 42:4703efac:::10003d86a99.00000001:head 2023-04-26T11:41:00.812+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 missing primary copy of 42:4703efac:::10003d86a99.00000001:head, will try copies on 275,284 2023-04-26T11:41:00.824+0200 7f03a821f700 -1 osd.34 1352563 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.9534859837.0:20412906 42.e2 42:4703efac:::10003d86a99.00000001:head [read 0~1048576 [307@0] out=1048576b] snapc 0=[] RETRY=5 ondisk+retry+read+known_if_redirected e1352553) 2023-04-26T11:41:00.824+0200 7f03a821f700 0 log_channel(cluster) log [WRN] : 1 slow requests (by type [ 'delayed' : 1 ] most affected pool [ 'qa-cephfs_data' : 1 ]) 2023-04-26T11:41:00.840+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 full-object read crc 0xebd673ed != expected 0xffffffff on 42:4703efac:::10003d86a99.00000001:head 2023-04-26T11:41:00.864+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 full-object read crc 0xebd673ed != expected 0xffffffff on 42:4703efac:::10003d86a99.00000001:head 2023-04-26T11:41:00.864+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 missing primary copy of 42:4703efac:::10003d86a99.00000001:head, will try copies on 275,284 2023-04-26T11:41:00.888+0200 7f03921f3700 -1 log_channel(cluster) log [ERR] : 42.e2 full-object read crc 0xebd673ed != expected 0xffffffff on 42:4703efac:::10003d86a99.00000001:head We have tried the following: - Restarting the OSD in question clears the error for a few seconds but then we also we get OSD_TOO_MANY_REPAIRS on OSDs with PGs that holds the object that have blocked I/O. - Trying to repair the PG seems to restart every 10 second and not actually do anything/progressing. (Is there a way to check repair progress?) - Restarting the MDS and hypervisor clears the error (the hypervisor hangs for several minutes before timing out). However if the object is requested again the error reoccurs. If we don't access the object we are able to eventually repair the PG. - Occasionally setting the primary-affinity to 0 for the primary OSD in the PG clears the error after restarting all affected OSD and we are able to repair the PG (unless the object is accessed during recovery) and access to the object is OK afterwards. - Finding and deleting the file pointing to the object (10003d86a99) and restarting OSDs will clear the error. - Killing the samba process that accessed the object does not clear the SLOW_OPS, and hence the error prevail - Normal scrubs have revealed a handfull of other PGs in the same pool (id 42) that are damaged and we are doing repairs without any problems. - We believe MDS_CLIENT_LATE_RELEASE and SLOW_OPS errors are symptoms of the fact that the I/O are blocked. - We have verified that there are no SMART errors of any kind on any of our disks in the cluster. - If we don't handle this issue rather promptly, we experience full lockup of the samba container and rebooting hypervisor seems to be the only cure. Trying to force unmount and remount cephfs does not help. This have now happened 6-7 times over the last 2 weeks and we suspect that a hardware or memory error on one of our nodes may have caused the objects to be written to disk with bad checksums. We have replaced the mainboard in one of our nodes that we might think is the culprit and are currently testing the memory. Can these random checksum errors be caused by anything else that we should investigate? It's a bit suspicious that the error only occurs on one specific pool? If the mainboard are to blame we should see these errors in more pools by now? Regardless we are stumped by how Ceph handles this error. Checksum-errors should not leave clients hanging like this? Should this be considered a bug? Is there a way to cancel the blocking I/O request to clear the error? And why is the PG flapping between active+recovering+degraded+repair, active+recovering+repair, active+clean+repair every few seconds? Any ideas on how to gracefully battle this problem? Thanks! --thomas Thomas Hukkelberg thomas(a)hovedkvarteret.no

1 year

3
3
0 0

Increase timeout for marking osd down

by Nicola Mori

Dear Ceph users, my cluster is made of very old machines on a Gbit ethernet. I see that sometimes some OSDs are marked down due to slow networking, especially on heavy network load like during recovery. This causes problems, for example PGs keeps being deactivated and activated as the OSDs are marked down and up (at least to my best understanding). So I'd need to know if there is some way to increase the timeout after which an OSD is marked down, to cope with my slow network. Thanks, Nicola

1 year

2
1
0 0

ceph pg stuck - missing on 1 osd how to proceed

by xadhoom76＠gmail.com

Hi to all Using ceph 17.2.5 i have 3 pgs in stuck state ceph pg map 8.2a6 osdmap e32862 pg 8.2a6 (8.2a6) -> up [88,100,59] acting [59,100] looking at it ho 88 ,100 and 59 i got that ceph pg ls-by-osd osd.100 | grep 8.2a6 8.2a6 211004 209089 0 0 174797925205 0 0 7075 active+undersized+degraded+remapped+backfilling 21m 32862'1540291 32862:3387785 [88,100,59]p88 [59,100]p59 2023-03-12T08:08:00.903727+0000 2023-03-12T08:08:00.903727+0000 6839 queued for deep scrub ceph pg ls-by-osd osd.59 | grep 8.2a6 8.2a6 211005 209084 0 0 174798941087 0 0 7076 active+undersized+degraded+remapped+backfilling 22m 32862'1540292 32862:3387798 [88,100,59]p88 [59,100]p59 2023-03-12T08:08:00.903727+0000 2023-03-12T08:08:00.903727+0000 6839 queued for deep scrub BUT ceph pg ls-by-osd osd.88 | grep 8.2a6 ---> NONE it is missing .... how to proceed ? Best regards

1 year

4
7
0 0

2024

2023

2022

2021

2020

2019

ceph-users April 2023