November 2023 - ceph-users

easy way to find out the number of allocated objects for a RBD image

by Tony Liu

Hi, Other than get all objects of the pool and filter by image ID, is there any easier way to get the number of allocated objects for a RBD image? What I really want to know is the actual usage of an image. An allocated object could be used partially, but that's fine, no need to be 100% accurate. To get the object count and times object size, that should be sufficient. "rbd export" exports actual used data, but to get the actual usage by exporting the image seems too much. This brings up another question, is there any way to know the export size before running it? Thanks! Tony

3 months, 1 week

4
4
0 0

Re: Ceph OSD reported Slow operations

by V A Prabha

Hi Eugen Please find the details below root@meghdootctr1:/var/log/ceph# ceph -s cluster: id: c59da971-57d1-43bd-b2b7-865d392412a5 health: HEALTH_WARN nodeep-scrub flag(s) set 544 pgs not deep-scrubbed in time services: mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d) mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3 mds: 3 up:standby osd: 36 osds: 36 up (since 34h), 36 in (since 34h) flags nodeep-scrub data: pools: 2 pools, 544 pgs objects: 10.14M objects, 39 TiB usage: 116 TiB used, 63 TiB / 179 TiB avail pgs: 544 active+clean io: client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr Ceph Versions: root@meghdootctr1:/var/log/ceph# ceph --version ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus (stable) Ceph df -h https://pastebin.com/1ffucyJg Ceph OSD performance dump https://pastebin.com/1R6YQksE Ceph tell osd.XX bench (Out of 36 osds only 8 OSDs give High IOPS value of 250 +. Out of that 4 OSDs are from HP 3PAR and 4 OSDS from DELL EMC. We are using only 4 OSDs from HP3 par and it is working fine without any latency and iops issues from the beginning but the remaining 32 OSDs are from DELL EMC in which 4 OSDs are much better than the remaining 28 OSDs) https://pastebin.com/CixaQmBi Please help me to identify if the issue is with the DELL EMC Storage, Ceph configuration parameter tuning or the Overload in the cloud setup On November 1, 2023 at 9:48 PM Eugen Block <eblock(a)nde.ag> wrote: > Hi, > > for starters please add more cluster details like 'ceph status', 'ceph > versions', 'ceph osd df tree'. Increasing the to 10G was the right > thing to do, you don't get far with 1G with real cluster load. How are > the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)? > How is the disk utilization? > > Regards, > Eugen > > Zitat von prabhav(a)cdac.in: > > > In a production setup of 36 OSDs( SAS disks) totalling 180 TB > > allocated to a single Ceph Cluster with 3 monitors and 3 managers. > > There were 830 volumes and VMs created in Openstack with Ceph as a > > backend. On Sep 21, users reported slowness in accessing the VMs. > > Analysing the logs lead us to problem with SAS , Network congestion > > and Ceph configuration( as all default values were used). We updated > > the Network from 1Gbps to 10Gbps for public and cluster networking. > > There was no change. > > The ceph benchmark performance showed that 28 OSDs out of 36 OSDs > > reported very low IOPS of 30 to 50 while the remaining showed 300+ > > IOPS. > > We gradually started reducing the load on the ceph cluster and now > > the volumes count is 650. Now the slow operations has gradually > > reduced but I am aware that this is not the solution. > > Ceph configuration is updated with increasing the > > osd_journal_size to 10 GB, > > osd_max_backfills = 1 > > osd_recovery_max_active = 1 > > osd_recovery_op_priority = 1 > > bluestore_cache_trim_max_skip_pinned=10000 > > > > After one month, now we faced another issue with Mgr daemon stopped > > in all 3 quorums and 16 OSDs went down. From the > > ceph-mon,ceph-mgr.log could not get the reason. Please guide me as > > its a production setup > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io Thanks & Regards, Ms V A Prabha / श्रीमती प्रभा वी ए Joint Director / संयुक्त निदेशक Centre for Development of Advanced Computing(C-DAC) / प्रगत संगणन विकास केन्द्र(सी-डैक) Tidel Park”, 8th Floor, “D” Block, (North &South) / “टाइडल पार्क”,8वीं मंजिल, “डी” ब्लॉक, (उत्तर और दक्षिण) No.4, Rajiv Gandhi Salai / नं.4, राजीव गांधी सलाई Taramani / तारामणि Chennai / चेन्नई – 600113 Ph.No.:044-22542226/27 Fax No.: 044-22542294 ------------------------------------------------------------------------------------------------------------ [ C-DAC is on Social-Media too. Kindly follow us at: Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] This e-mail is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies and the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email is strictly prohibited and appropriate legal action will be taken. ------------------------------------------------------------------------------------------------------------

3 months, 1 week

4
15
0 0

Ceph 16.2.14: ceph-mgr getting oom-killed

by Zakhar Kirpichenko

Hi, I'm facing a rather new issue with our Ceph cluster: from time to time ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over 100 GB RAM: [Nov21 15:02] tp_osd_tp invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ +0.000010] oom_kill_process.cold+0xb/0x10 [ +0.000002] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ +0.000008] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167 [ +0.000697] Out of memory: Killed process 3941610 (ceph-mgr) total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, shmem-rss:0kB, UID:167 pgtables:260356kB oom_score_adj:0 [ +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB The cluster is stable and operating normally, there's nothing unusual going on before, during or after the kill, thus it's unclear what causes the mgr to balloon, use all RAM and get killed. Systemd logs aren't very helpful: they just show normal mgr operations until it fails to allocate memory and gets killed: https://pastebin.com/MLyw9iVi The mgr experienced this issue several times in the last 2 months, and the events don't appear to correlate with any other events in the cluster because basically nothing else happened at around those times. How can I investigate this and figure out what's causing the mgr to consume all memory and get killed? I would very much appreciate any advice! Best regards, Zakhar

3 months, 2 weeks

6
21
0 0

Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints?

by Götz Reinicke

Hi, As I’v read and thought a lot about the migration as this is a bigger project, I was wondering if anyone has done that already and might share some notes or playbooks, because in all readings there where some parts missing or miss understandable to me. I do have some different approaches in mind, so may be you have some suggestions or hints. a) upgrade nautilus on centos 7 with the few missing features like dashboard and prometheus. After that migrate one node after an other to ubuntu 20.04 with octopus and than upgrade ceph to the recent stable version. b) migrate one node after an other to ubuntu 18.04 with nautilus and then upgrade to octupus and after that to ubuntu 20.04. or c) upgrade one node after an other to ubuntu 20.04 with octopus and join it to the cluster until all nodes are upgraded. For test I tried c) with a mon node, but adding that to the cluster fails with some failed state, still probing for the other mons. (I dont have the right log at hand right now.) So my questions are: a) What would be the best (most stable) migration path and b) is it in general possible to add a new octopus mon (not upgraded one) to a nautilus cluster, where the other mons are still on nautilus? I hope my thoughts and questions are understandable :) Thanks for any hint and suggestion. Best . Götz

3 months, 3 weeks

7
11
0 0

How to configure something like osd_deep_scrub_min_interval?

by Frank Schilder

Hi folks, I am fighting a bit with odd deep-scrub behavior on HDDs and discovered a likely cause of why the distribution of last_deep_scrub_stamps is so weird. I wrote a small script to extract a histogram of scrubs by "days not scrubbed" (more precisely, intervals not scrubbed; see code) to find out how (deep-) scrub times are distributed. Output below. What I expected is along the lines that HDD-OSDs try to scrub every 1-3 days, while they try to deep-scrub every 7-14 days. In other words, OSDs that have been deep-scrubbed within the last 7 days would *never* be in scrubbing+deep state. However, what I see is completely different. There seems to be no distinction between scrub- and deep-scrub start times. This is really unexpected as nobody would try to deep-scrub HDDs every day. Weekly to bi-weekly is normal, specifically for large drives. Is there a way to configure something like osd_deep_scrub_min_interval (no, I don't want to run cron jobs for scrubbing yet)? In the output below, I would like to be able to configure a minimum period of 1-2 weeks before the next deep-scrub happens. How can I do that? The observed behavior is very unusual for RAID systems (if its not a bug in the report script). With this behavior its not surprising that people complain about "not deep-scrubbed in time" messages and too high deep-scrub IO load when such a large percentage of OSDs is needlessly deep-scrubbed after 1-6 days again already. Sample output: # scrub-report dumped pgs Scrub report: 4121 PGs not scrubbed since 1 intervals (6h) 3831 PGs not scrubbed since 2 intervals (6h) 4012 PGs not scrubbed since 3 intervals (6h) 3986 PGs not scrubbed since 4 intervals (6h) 2998 PGs not scrubbed since 5 intervals (6h) 1488 PGs not scrubbed since 6 intervals (6h) 909 PGs not scrubbed since 7 intervals (6h) 771 PGs not scrubbed since 8 intervals (6h) 582 PGs not scrubbed since 9 intervals (6h) 2 scrubbing 431 PGs not scrubbed since 10 intervals (6h) 333 PGs not scrubbed since 11 intervals (6h) 1 scrubbing 265 PGs not scrubbed since 12 intervals (6h) 195 PGs not scrubbed since 13 intervals (6h) 116 PGs not scrubbed since 14 intervals (6h) 78 PGs not scrubbed since 15 intervals (6h) 1 scrubbing 72 PGs not scrubbed since 16 intervals (6h) 37 PGs not scrubbed since 17 intervals (6h) 5 PGs not scrubbed since 18 intervals (6h) 14.237* 19.5cd* 19.12cc* 19.1233* 14.40e* 33 PGs not scrubbed since 20 intervals (6h) 23 PGs not scrubbed since 21 intervals (6h) 16 PGs not scrubbed since 22 intervals (6h) 12 PGs not scrubbed since 23 intervals (6h) 8 PGs not scrubbed since 24 intervals (6h) 2 PGs not scrubbed since 25 intervals (6h) 19.eef* 19.bb3* 4 PGs not scrubbed since 26 intervals (6h) 19.b4c* 19.10b8* 19.f13* 14.1ed* 5 PGs not scrubbed since 27 intervals (6h) 19.43f* 19.231* 19.1dbe* 19.1788* 19.16c0* 6 PGs not scrubbed since 28 intervals (6h) 2 PGs not scrubbed since 30 intervals (6h) 19.10f6* 14.9d* 3 PGs not scrubbed since 31 intervals (6h) 19.1322* 19.1318* 8.a* 1 PGs not scrubbed since 32 intervals (6h) 19.133f* 1 PGs not scrubbed since 33 intervals (6h) 19.1103* 3 PGs not scrubbed since 36 intervals (6h) 19.19cc* 19.12f4* 19.248* 1 PGs not scrubbed since 39 intervals (6h) 19.1984* 1 PGs not scrubbed since 41 intervals (6h) 14.449* 1 PGs not scrubbed since 44 intervals (6h) 19.179f* Deep-scrub report: 3723 PGs not deep-scrubbed since 1 intervals (24h) 4621 PGs not deep-scrubbed since 2 intervals (24h) 8 scrubbing+deep 3588 PGs not deep-scrubbed since 3 intervals (24h) 8 scrubbing+deep 2929 PGs not deep-scrubbed since 4 intervals (24h) 3 scrubbing+deep 1705 PGs not deep-scrubbed since 5 intervals (24h) 4 scrubbing+deep 1904 PGs not deep-scrubbed since 6 intervals (24h) 5 scrubbing+deep 1540 PGs not deep-scrubbed since 7 intervals (24h) 7 scrubbing+deep 1304 PGs not deep-scrubbed since 8 intervals (24h) 7 scrubbing+deep 923 PGs not deep-scrubbed since 9 intervals (24h) 5 scrubbing+deep 557 PGs not deep-scrubbed since 10 intervals (24h) 7 scrubbing+deep 501 PGs not deep-scrubbed since 11 intervals (24h) 2 scrubbing+deep 363 PGs not deep-scrubbed since 12 intervals (24h) 2 scrubbing+deep 377 PGs not deep-scrubbed since 13 intervals (24h) 1 scrubbing+deep 383 PGs not deep-scrubbed since 14 intervals (24h) 2 scrubbing+deep 252 PGs not deep-scrubbed since 15 intervals (24h) 2 scrubbing+deep 116 PGs not deep-scrubbed since 16 intervals (24h) 5 scrubbing+deep 47 PGs not deep-scrubbed since 17 intervals (24h) 2 scrubbing+deep 10 PGs not deep-scrubbed since 18 intervals (24h) 2 PGs not deep-scrubbed since 19 intervals (24h) 19.1c6c* 19.a01* 1 PGs not deep-scrubbed since 20 intervals (24h) 14.1ed* 2 PGs not deep-scrubbed since 21 intervals (24h) 19.1322* 19.10f6* 1 PGs not deep-scrubbed since 23 intervals (24h) 19.19cc* 1 PGs not deep-scrubbed since 24 intervals (24h) 19.179f* PGs marked with a * are on busy OSDs and not eligible for scrubbing. The script (pasted here because attaching doesn't work): # cat bin/scrub-report #!/bin/bash # Compute last scrub interval count. Scrub interval 6h, deep-scrub interval 24h. # Print how many PGs have not been (deep-)scrubbed since #intervals. ceph -f json pg dump pgs 2>&1 > /root/.cache/ceph/pgs_dump.json echo "" T0="$(date +%s)" scrub_info="$(jq --arg T0 "$T0" -rc '.pg_stats[] | [ .pgid, (.last_scrub_stamp[:19]+"Z" | (($T0|tonumber) - fromdateiso8601)/(60*60*6)|ceil), (.last_deep_scrub_stamp[:19]+"Z" | (($T0|tonumber) - fromdateiso8601)/(60*60*24)|ceil), .state, (.acting | join(" ")) ] | @tsv ' /root/.cache/ceph/pgs_dump.json)" # less <<<"$scrub_info" # 1 2 3 4 5..NF # pg_id scrub-ints deep-scrub-ints status acting[] awk <<<"$scrub_info" '{ for(i=5; i<=NF; ++i) pg_osds[$1]=pg_osds[$1] " " $i if($4 == "active+clean") { si_mx=si_mx<$2 ? $2 : si_mx dsi_mx=dsi_mx<$3 ? $3 : dsi_mx pg_sn[$2]++ pg_sn_ids[$2]=pg_sn_ids[$2] " " $1 pg_dsn[$3]++ pg_dsn_ids[$3]=pg_dsn_ids[$3] " " $1 } else if($4 ~ /scrubbing\+deep/) { deep_scrubbing[$3]++ for(i=5; i<=NF; ++i) osd[$i]="busy" } else if($4 ~ /scrubbing/) { scrubbing[$2]++ for(i=5; i<=NF; ++i) osd[$i]="busy" } else { unclean[$2]++ unclean_d[$3]++ si_mx=si_mx<$2 ? $2 : si_mx dsi_mx=dsi_mx<$3 ? $3 : dsi_mx pg_sn[$2]++ pg_sn_ids[$2]=pg_sn_ids[$2] " " $1 pg_dsn[$3]++ pg_dsn_ids[$3]=pg_dsn_ids[$3] " " $1 for(i=5; i<=NF; ++i) osd[$i]="busy" } } END { print "Scrub report:" for(si=1; si<=si_mx; ++si) { if(pg_sn[si]==0 && scrubbing[si]==0 && unclean[si]==0) continue; printf("%7d PGs not scrubbed since %2d intervals (6h)", pg_sn[si], si) if(scrubbing[si]) printf(" %d scrubbing", scrubbing[si]) if(unclean[si]) printf(" %d unclean", unclean[si]) if(pg_sn[si]<=5) { split(pg_sn_ids[si], pgs) osds_busy=0 for(pg in pgs) { split(pg_osds[pgs[pg]], osds) for(o in osds) if(osd[osds[o]]=="busy") osds_busy=1 if(osds_busy) printf(" %s*", pgs[pg]) if(!osds_busy) printf(" %s", pgs[pg]) } } printf("\n") } print "" print "Deep-scrub report:" for(dsi=1; dsi<=dsi_mx; ++dsi) { if(pg_dsn[dsi]==0 && deep_scrubbing[dsi]==0 && unclean_d[dsi]==0) continue; printf("%7d PGs not deep-scrubbed since %2d intervals (24h)", pg_dsn[dsi], dsi) if(deep_scrubbing[dsi]) printf(" %d scrubbing+deep", deep_scrubbing[dsi]) if(unclean_d[dsi]) printf(" %d unclean", unclean_d[dsi]) if(pg_dsn[dsi]<=5) { split(pg_dsn_ids[dsi], pgs) osds_busy=0 for(pg in pgs) { split(pg_osds[pgs[pg]], osds) for(o in osds) if(osd[osds[o]]=="busy") osds_busy=1 if(osds_busy) printf(" %s*", pgs[pg]) if(!osds_busy) printf(" %s", pgs[pg]) } } printf("\n") } print "" print "PGs marked with a * are on busy OSDs and not eligible for scrubbing." } ' Don't forget the last "'" when copy-pasting. Thanks for any pointers. ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

4 months

2
6
0 0

What is the maximum number of Rados gateway objects in one cluster using the bucket index and in one bucket?

by steve jung

Hello. We are using Ceph storage to test whether we can run the service by uploading and saving more than 40 billion files. So I'd like to check the contents below. 1) Maximum number of Rados gateway objects that can be stored in one cluster using the bucket index 2) Maximum number of Rados gateway objects that can be stored in one bucket Although we have referred to the limitations on the number of Rados gateway objects mentioned in existing documents, it seems theoretically unlimited If you have operated the number of files at the level we think in actual services or products, we would appreciate it if you could share them. Below are related documents and related settings values. > Related documents - https://documentation.suse.com/ses/5.5/html/ses-all/cha-ceph-gw.html - https://www.ibm.com/docs/en/storage-ceph/6?topic=resharding-limitations-buc… - https://docs.ceph.com/en/latest/dev/radosgw/bucket_index/ > Related config - rgw_dynamic_resharding: true - rgw_max_objs_per_shard: 100000 - rgw_max_dynamic_shards : 65521

4 months, 1 week

2
1
0 0

Re: increasing number of (deep) scrubs

by Frank Schilder

Hi Dan, thanks for your answer. I don't have a problem with increasing osd_max_scrubs (=1 at the moment) as such. I would simply prefer a somewhat finer grained way of controlling scrubbing than just doubling or tripling it right away. Some more info. These 2 pools are data pools for a large FS. Unfortunately, we have a large percentage of small files, which is a pain for recovery and seemingly also for deep scrubbing. Our OSDs are about 25% used and I had to increase the warning interval already to 2 weeks. With all the warning grace parameters this means that we manage to deep scrub everything about every month. I need to plan for 75% utilisation and a 3 months period is a bit far on the risky side. Our data is to a large percentage cold data. Client reads will not do the check for us, we need to combat bit-rot pro-actively. The reasons I'm interested in parameters initiating more scrubs while also converting more scrubs into deep scrubs are, that 1) scrubs seem to complete very fast. I almost never catch a PG in state "scrubbing", I usually only see "deep scrubbing". 2) I suspect the low deep-scrub count is due to a low number of deep-scrubs scheduled and not due to conflicting per-OSD deep scrub reservations. With the OSD count we have and the distribution over 12 servers I would expect at least a peak of 50% OSDs being active in scrubbing instead of the 25% peak I'm seeing now. It ought to be possible to schedule more PGs for deep scrub than actually are. 3) Every OSD having only 1 deep scrub active seems to have no measurable impact on user IO. If I could just get more PGs scheduled with 1 deep scrub per OSD it would already help a lot. Once this is working, I can eventually increase osd_max_scrubs when the OSDs fill up. For now I would just like that (deep) scrub scheduling looks a bit harder and schedules more eligible PGs per time unit. If we can get deep scrubbing up to an average of 42PGs completing per hour with keeping osd_max_scrubs=1 to maintain current IO impact, we should be able to complete a deep scrub with 75% full OSDs in about 30 days. This is the current tail-time with 25% utilisation. I believe currently a deep scrub of a PG in these pools takes 2-3 hours. Its just a gut feeling from some repair and deep-scrub commands, I would need to check logs for more precise info. Increasing osd_max_scrubs would then be a further and not the only option to push for more deep scrubbing. My expectation would be that values of 2-3 are fine due to the increasingly higher percentage of cold data for which no interference with client IO will happen. Hope that makes sense and there is a way beyond bumping osd_max_scrubs to increase the number of scheduled and executed deep scrubs. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Dan van der Ster <dvanders(a)gmail.com> Sent: 05 January 2023 15:36 To: Frank Schilder Cc: ceph-users(a)ceph.io Subject: Re: [ceph-users] increasing number of (deep) scrubs Hi Frank, What is your current osd_max_scrubs, and why don't you want to increase it? With 8+2, 8+3 pools each scrub is occupying the scrub slot on 10 or 11 OSDs, so at a minimum it could take 3-4x the amount of time to scrub the data than if those were replicated pools. If you want the scrub to complete in time, you need to increase the amount of scrub slots accordingly. On the other hand, IMHO the 1-week deadline for deep scrubs is often much too ambitious for large clusters -- increasing the scrub intervals is one solution, or I find it simpler to increase mon_warn_pg_not_scrubbed_ratio and mon_warn_pg_not_deep_scrubbed_ratio until you find a ratio that works for your cluster. Of course, all of this can impact detection of bit-rot, which anyway can be covered by client reads if most data is accessed periodically. But if the cluster is mostly idle or objects are generally not read, then it would be preferable to increase slots osd_max_scrubs. Cheers, Dan On Tue, Jan 3, 2023 at 2:30 AM Frank Schilder <frans(a)dtu.dk> wrote: > > Hi all, > > we are using 16T and 18T spinning drives as OSDs and I'm observing that they are not scrubbed as often as I would like. It looks like too few scrubs are scheduled for these large OSDs. My estimate is as follows: we have 852 spinning OSDs backing a 8+2 pool with 2024 and an 8+3 pool with 8192 PGs. On average I see something like 10PGs of pool 1 and 12 PGs of pool 2 (deep) scrubbing. This amounts to only 232 out of 852 OSDs scrubbing and seems to be due to a conservative rate of (deep) scrubs being scheduled. The PGs (dep) scrub fairly quickly. > > I would like to increase gently the number of scrubs scheduled for these drives and *not* the number of scrubs per OSD. I'm looking at parameters like: > > osd_scrub_backoff_ratio > osd_deep_scrub_randomize_ratio > > I'm wondering if lowering osd_scrub_backoff_ratio to 0.5 and, maybe, increasing osd_deep_scrub_randomize_ratio to 0.2 would have the desired effect? Are there other parameters to look at that allow gradual changes in the number of scrubs going on? > > Thanks a lot for your help! > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

4 months, 4 weeks

2
3
0 0

reef 18.2.1 QE Validation status

by Yuri Weinstein

Details of this release are summarized here: https://tracker.ceph.com/issues/63443#note-1 Seeking approvals/reviews for: smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures) rados - Neha, Radek, Travis, Ernesto, Adam King rgw - Casey fs - Venky orch - Adam King rbd - Ilya krbd - Ilya upgrade/quincy-x (reef) - Laura PTL powercycle - Brad perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures) Please reply to this email with approval and/or trackers of known issues/PRs to address them. TIA YuriW

5 months

12
30
0 0

ceph fs (meta) data inconsistent

by Frank Schilder

Dear fellow cephers, today we observed a somewhat worrisome inconsistency on our ceph fs. A file created on one host showed up as 0 length on all other hosts: [user1@host1 h2lib]$ ls -lh total 37M -rw-rw---- 1 user1 user1 12K Nov 1 11:59 dll_wrapper.py [user2@host2 h2lib]# ls -l total 34 -rw-rw----. 1 user1 user1 0 Nov 1 11:59 dll_wrapper.py [user1@host1 h2lib]$ cp dll_wrapper.py dll_wrapper.py.test [user1@host1 h2lib]$ ls -l total 37199 -rw-rw---- 1 user1 user1 11641 Nov 1 11:59 dll_wrapper.py -rw-rw---- 1 user1 user1 11641 Nov 1 13:10 dll_wrapper.py.test [user2@host2 h2lib]# ls -l total 45 -rw-rw----. 1 user1 user1 0 Nov 1 11:59 dll_wrapper.py -rw-rw----. 1 user1 user1 11641 Nov 1 13:10 dll_wrapper.py.test Executing a sync on all these hosts did not help. However, deleting the problematic file and replacing it with a copy seemed to work around the issue. We saw this with ceph kclients of different versions, it seems to be on the MDS side. How can this happen and how dangerous is it? ceph fs status (showing ceph version): # ceph fs status con-fs2 - 1662 clients ======= RANK STATE MDS ACTIVITY DNS INOS 0 active ceph-15 Reqs: 14 /s 2307k 2278k 1 active ceph-11 Reqs: 159 /s 4208k 4203k 2 active ceph-17 Reqs: 3 /s 4533k 4501k 3 active ceph-24 Reqs: 3 /s 4593k 4300k 4 active ceph-14 Reqs: 1 /s 4228k 4226k 5 active ceph-13 Reqs: 5 /s 1994k 1782k 6 active ceph-16 Reqs: 8 /s 5022k 4841k 7 active ceph-23 Reqs: 9 /s 4140k 4116k POOL TYPE USED AVAIL con-fs2-meta1 metadata 2177G 7085G con-fs2-meta2 data 0 7085G con-fs2-data data 1242T 4233T con-fs2-data-ec-ssd data 706G 22.1T con-fs2-data2 data 3409T 3848T STANDBY MDS ceph-10 ceph-08 ceph-09 ceph-12 MDS version: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) There is no health issue: # ceph status cluster: id: abc health: HEALTH_WARN 3 pgs not deep-scrubbed in time services: mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 9w) mgr: ceph-25(active, since 7w), standbys: ceph-26, ceph-01, ceph-03, ceph-02 mds: con-fs2:8 4 up:standby 8 up:active osd: 1284 osds: 1279 up (since 2d), 1279 in (since 5d) task status: data: pools: 14 pools, 25065 pgs objects: 2.20G objects, 3.9 PiB usage: 4.9 PiB used, 8.2 PiB / 13 PiB avail pgs: 25039 active+clean 26 active+clean+scrubbing+deep io: client: 799 MiB/s rd, 55 MiB/s wr, 3.12k op/s rd, 1.82k op/s wr The inconsistency seems undiagnosed, I couldn't find anything interesting in the cluster log. What should I look for and where? I moved the folder to another location for diagnosis. Unfortunately, I don't have 2 clients any more showing different numbers, I see a 0 length now everywhere for the moved folder. I'm pretty sure though that the file still is non-zero length. Thanks for any pointers. ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

5 months

3
18
0 0

Difficulty adding / using a non-default RGW placement target & storage class

by Anthony D'Atri

I'm having difficulty adding and using a non-default placement target & storage class and would appreciate insights. Am I going about this incorrectly? Rook does not yet have the ability to do this, so I'm adding it by hand. Following instructions on the net I added a second bucket pool, placement target, and storageclass, and created a user defaulting to the new pg/sc, but I get an error when trying to create a bucket: [rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$ s5cmd --endpoint-url http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc mb s3://foofoobars ERROR "mb s3://foofoobars": InvalidLocationConstraint: status code: 400, request id: tx0000057b71002881d48ca-0065495d54-1abe555-ceph-objectstore, host id: I found an article suggesting that the placement target and/or storage class should have the api_name prepended, so I tried setting either or both to "ceph-objectstore:HDD-EC" / "ceph-objectstore:GLACIER" with no success. I suspect that I'm missing something subtle -- or that Rook has provisioned these bits in an atypical fashion. Log entry: /var/log/ceph/ceph-client.rgw.ceph.objectstore.a.log-2023-11-06T21:40:36.543+0000 7f6573a9f700 1 ====== starting new request req=0x7f64818ba730 ===== /var/log/ceph/ceph-client.rgw.ceph.objectstore.a.log-2023-11-06T21:40:36.546+0000 7f6570a99700 0 req 6320538205097380042 0.003000009s s3:create_bucket could not find user default placement id HDD-EC/GLACIER within zonegroup /var/log/ceph/ceph-client.rgw.ceph.objectstore.a.log-2023-11-06T21:40:36.546+0000 7f6570a99700 1 ====== req done req=0x7f64818ba730 op status=-2208 http_status=400 latency=0.003000009s ====== /var/log/ceph/ceph-client.rgw.ceph.objectstore.a.log:2023-11-06T21:40:36.546+0000 7f6570a99700 1 beast: 0x7f64818ba730: 10.233.90.156 - aad [06/Nov/2023:21:40:36.543 +0000] "PUT /foofoobars HTTP/1.1" 400 266 - "aws-sdk-go/1.40.25 (go1.18.3; linux; amd64)" - latency=0.003000009s [rook@rook-ceph-tools-5ff8d58445-gkl5w ~]$ ceph -v ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable) Here's the second buckets pool, constrained to HDDs. AFAICT it can share the index and data_extra_pool created for the default / STANDARD pt/sc by Rook. I initially omitted ec_overwrites but enabled it after creation. pool 19 'ceph-objectstore.rgw.buckets.data' erasure profile ceph-objectstore.rgw.buckets.data_ecprofile size 6 min_size 5 crush_rule 10 object_hash rjenkins pg_num 8192 pgp_num 8192 autoscale_mode off last_change 165350 lfor 0/156300/165341 flags hashpspool,ec_overwrites stripe_width 16384 application rook-ceph-rgw pool 21 'ceph-objectstore.rgw.buckets.data.hdd' erasure profile ceph-objectstore.rgw.buckets.data_ecprofile_hdd size 6 min_size 5 crush_rule 11 object_hash rjenkins pg_num 8192 pgp_num 8192 autoscale_mode off last_change 167193 lfor 0/0/164453 flags hashpspool,ec_overwrites stripe_width 16384 application rook-ceph-rgw [rook@rook-ceph-tools-5ff8d58445-gkl5w ~]$ [rook@rook-ceph-tools-5ff8d58445-gkl5w ~]$ radosgw-admin zonegroup get { "id": "d994155c-2a9c-4e37-ae30-64fd2934ff99", "name": "ceph-objectstore", "api_name": "ceph-objectstore", "is_master": "true", "endpoints": [ "http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80" ], "hostnames": [], "hostnames_s3website": [], "master_zone": "72035401-a6d9-426b-8c89-9a17e268825f", "zones": [ { "id": "72035401-a6d9-426b-8c89-9a17e268825f", "name": "ceph-objectstore", "endpoints": [ "http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80" ], "log_meta": "false", "log_data": "false", "bucket_index_max_shards": 11, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [], "redirect_zone": "" } ], "placement_targets": [ { "name": "HDD-EC", "tags": [], "storage_classes": [ "GLACIER" ] }, { "name": "default-placement", "tags": [], "storage_classes": [ "STANDARD" ] } ], "default_placement": "default-placement", "realm_id": "51fb8875-31ac-40ef-ab21-0ffd4e229f15", "sync_policy": { "groups": [] } } [rook@rook-ceph-tools-5ff8d58445-gkl5w ~]$ radosgw-admin zone get { "id": "72035401-a6d9-426b-8c89-9a17e268825f", "name": "ceph-objectstore", "domain_root": "ceph-objectstore.rgw.meta:root", "control_pool": "ceph-objectstore.rgw.control", "gc_pool": "ceph-objectstore.rgw.log:gc", "lc_pool": "ceph-objectstore.rgw.log:lc", "log_pool": "ceph-objectstore.rgw.log", "intent_log_pool": "ceph-objectstore.rgw.log:intent", "usage_log_pool": "ceph-objectstore.rgw.log:usage", "roles_pool": "ceph-objectstore.rgw.meta:roles", "reshard_pool": "ceph-objectstore.rgw.log:reshard", "user_keys_pool": "ceph-objectstore.rgw.meta:users.keys", "user_email_pool": "ceph-objectstore.rgw.meta:users.email", "user_swift_pool": "ceph-objectstore.rgw.meta:users.swift", "user_uid_pool": "ceph-objectstore.rgw.meta:users.uid", "otp_pool": "ceph-objectstore.rgw.otp", "system_key": { "access_key": "", "secret_key": "" }, "placement_pools": [ { "key": "HDD-EC", "val": { "index_pool": "ceph-objectstore.rgw.buckets.index", "storage_classes": { "GLACIER": { "data_pool": "ceph-objectstore.rgw.buckets.data.hdd" }, "STANDARD": {} # <------------- seems like this shouldn't be here? }, "data_extra_pool": "ceph-objectstore.rgw.buckets.non-ec", "index_type": 0 } }, { "key": "default-placement", "val": { "index_pool": "ceph-objectstore.rgw.buckets.index", "storage_classes": { "STANDARD": { "data_pool": "ceph-objectstore.rgw.buckets.data" } }, "data_extra_pool": "ceph-objectstore.rgw.buckets.non-ec", "index_type": 0 } } ], "realm_id": "", "notif_pool": "ceph-objectstore.rgw.log:notif" } [rook@rook-ceph-tools-5ff8d58445-gkl5w ~]$ radosgw-admin user info --uid=aad { "user_id": "aad", "display_name": "Anthony", "email": "", "suspended": 0, "max_buckets": 1000, "subusers": [], "keys": [ { "user": "aad", "access_key": "xxxx", "secret_key": "yyyyyy" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "HDD-EC", "default_storage_class": "GLACIER", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "temp_url_keys": [], "type": "rgw", "mfa_ids": [] }

5 months

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users November 2023