March 2023 - ceph-users - lists.ceph.io

by Robert Sander

Hi, There is an operation "radosgw-admin bi purge" that removes all bucket index objects for one bucket in the rados gateway. What is the undo operation for this? After this operation the bucket cannot be listed or removed any more. Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 220009 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin

1 year, 1 month

4
9
0 0

User + Dev Meeting happening this week Thursday!

by Laura Flores

Hi Ceph Users, The User + Dev Meeting is happening this *Thursday, March 16th at 10am EST *(see extra meeting details below). If you have any topics you'd like to discuss, please add them to the etherpad: https://pad.ceph.com/p/ceph-user-dev-monthly-minutes One of the topics we wish to discuss is whether any users would be willing to help with early Reef testing after the RC comes out. Thanks, Laura Flores Meeting link: https://meet.jit.si/ceph-user-dev-monthly Time conversions: UTC: Thursday, March 16, 14:00 UTC Mountain View, CA, US: Thursday, March 16, 7:00 PDT Phoenix, AZ, US: Thursday, March 16, 7:00 MST Denver, CO, US: Thursday, March 16, 8:00 MDT Huntsville, AL, US: Thursday, March 16, 9:00 CDT Raleigh, NC, US: Thursday, March 16, 10:00 EDT London, England: Thursday, March 16, 14:00 GMT Paris, France: Thursday, March 16, 15:00 CET Helsinki, Finland: Thursday, March 16, 16:00 EET Tel Aviv, Israel: Thursday, March 16, 16:00 IST Pune, India: Thursday, March 16, 19:30 IST Brisbane, Australia: Friday, March 17, 0:00 AEST Singapore, Asia: Thursday, March 16, 22:00 +08 Auckland, New Zealand: Friday, March 17, 3:00 NZDT -- Laura Flores She/Her/Hers Software Engineer, Ceph Storage <https://ceph.io> Chicago, IL lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com> M: +17087388804

1 year, 1 month

1
0
0 0

Can't install cephadm on HPC

by zyz

Hi: I encountered a problem when I install cephadm on Huawei Cloud EulerOS. When enter the following command, it raise an error. What should I do? >> ./cephadm add-repo --release quincy << ERROR: Distro hce version 2.0 not supported

1 year, 1 month

2
1
0 0

quincy: test cluster on nvme: fast write, slow read

by Arvid Picciani

Hi, Doing some lab tests to understand why ceph isnt working for us, and here's the first puzzle: setup: A completely fresh quincy cluster, 64 core EPYC 7713, 2 nvme drives > ceph osd crush rule create-replicated osd default osd ssd > ceph osd pool create rbd replicated osd --size 2 > dd if=/dev/rbd0 of=/tmp/testfile status=progress bs=4M count=1000 4194304000 bytes (4.2 GB, 3.9 GiB) copied, 7.0152 s, 598 MB/s > dd of=/dev/rbd0 if=/tmp/testfile status=progress bs=4M count=1000 4194304000 bytes (4.2 GB, 3.9 GiB) copied, 3.82156 s, 1.1 GB/s write performance is 1/3 of raw nvme, which i suppose is expected (not very good tho) but why is read performance so bad? top shows only one core is being utilized at 40% cpu. it can't be network either, since this is all localhost. thanks Arvid -- +4916093821054

1 year, 1 month

1
0
0 0

pg wait too long when osd restart

by yite gu

Hi all, osd_heartbeat_grace = 20 and osd_pool_default_read_lease_ratio = 0.8 by default, so, pg will wait 16s when osd restart in the worst case. This wait time is too long, client i/o can not be unacceptable. I think adjusting the osd_pool_default_read_lease_ratio to lower is a good way. Have any good suggestions about reduce pg wait time？ Best Regard Yite Gu

1 year, 1 month

1
0
0 0

restoring ceph cluster from osds

by Ben

Hi, I ended up with having whole set of osds to get back original ceph cluster. I figured out to make the cluster running. However, it's status is something as below: bash-4.4$ ceph -s cluster: id: 3f271841-6188-47c1-b3fd-90fd4f978c76 health: HEALTH_WARN 7 daemons have recently crashed 4 slow ops, oldest one blocked for 35077 sec, daemons [mon.a,mon.b] have slow ops. services: mon: 3 daemons, quorum a,b,d (age 9h) mgr: b(active, since 14h), standbys: a osd: 4 osds: 0 up, 4 in (since 9h) data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: All osds are down. I checked the osds logs and attached with this. Please help and I wonder if it's possible to get the cluster back. I have some backup for monitor's data. Till now I haven't restore that in the course. Thanks, Ben

1 year, 1 month

2
5
0 0

Trying to throttle global backfill

by Rice, Christian

I have a large number of misplaced objects, and I have all osd settings to “1” already: sudo ceph tell osd.\* injectargs '--osd_max_backfills=1 --osd_recovery_max_active=1 --osd_recovery_op_priority=1' How can I slow it down even more? The cluster is too large, it’s impacting other network traffic 😉

1 year, 1 month

2
2
0 0

radosgw - octopus - 500 Bad file descriptor on upload

by Boris Behrens

Hi, we've observed 500er errors on uploading files to a single bucket, but the problem went away after around 2 hours. We've checked and saw the following error message: 2023-03-08T17:55:58.778+0000 7f8062f15700 0 WARNING: set_req_state_err err_no=125 resorting to 500 2023-03-08T17:55:58.778+0000 7f8062f15700 0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Bad file descriptor 2023-03-08T17:55:58.778+0000 7f8062f15700 1 ====== req done req=0x7f81d0189700 op status=-125 http_status=500 latency=65003730017ns ====== 2023-03-08T17:55:58.778+0000 7f8062f15700 1 beast: 0x7f81d0189700: IPADDRESS - - [2023-03-08T17:55:58.778961+0000] "PUT /BUCKET/OBJECT HTTP/1.1" 500 57 - "aws-sdk-php/3.257.11 OS/Linux/5.15.0-60-generic lang/php/8.2.3 GuzzleHttp/7" - It only happened to a single bucket over a period of 1-2 hours (around 300 requests). In the same time we've had >20k PUT requests the were working fine on other buckets. This error also seem to happen to other buckets, but only very sporadically. Did someone encounter this issue or knows what it could be? Cheers Boris

1 year, 1 month

1
0
0 0

LRC k6m3l3, rack outage and availability

by steve.bakerx1＠gmail.com

Hi, currently we are testing LRC codes and I got a cluster setup with 3 racks and 4 hosts in each of those. What I want to achieve is to have a storage efficient erasure code (<=200%) and also availability during a rack outage. In (my) theory, that should have worked with the LRC k6m3l3 having a crush-locality=rack and a crush-failure domain=host. But when I tested it, the PGs of the pool all go in the "down" state. So, when we've got k=6 data chunks and m=3 coding chunks, the data should be reconstructable with 6 of these 9 objects. With l=3, LRC splits these 9 objects in 3 groups of 3 objects and creates one additional locality-chunk per group. We now got 3 groups of 4 objects. These 3 groups get distributed over the 3 racks, the 4 objects of each group get distributed over the 4 hosts of a rack. I thought that on a full rack outage, the 6 remaining k/m chunks on the other 2 racks should still be enough to keep up the availability and the cluster could proceed in a degraded state. But it does not, so I guess my thinking is wrong :) I wonder what's the reason for this, is it maybe some min_size setting ? The default min_size of this pool becomes 7 - I also changed that to 6 (yes, one shouldn't do that in productrion I think) but got the same result. Below I've added some details about the cluster, pool creation and pg dumps. Any ideas ? Can s.o. explain why this does not work or give another solution how to achieve the described specifications? Thx! ############ Ceph version: ############ ceph --version ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable) ################# Creation of the pool: ################# ceph osd erasure-code-profile set lrc_individual_profile plugin=lrc k=6 m=3 l=3 crush-failure-domain=host crush-locality=rack crush-root=default ceph osd pool create lrc_individual_pool 1024 1024 erasure lrc_individual_profile ceph osd pool set lrc_individual_pool pg_num 1024 ceph osd pool set lrc_individual_pool pg_num_min 1024 ceph osd pool set lrc_individual_pool pgp_num 1024 ceph osd pool set lrc_individual_pool pg_autoscale_mode warn ceph osd pool set lrc_individual_pool bulk true ################## Resulting pool details: ################## ceph osd pool ls detail pool 72 'lrc_individual_pool' erasure profile lrc_individual_profile size 12 min_size 7 crush_rule 1 object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 140484 flags hashpspool,bulk stripe_width 24576 pg_num_min 1024 ceph osd pool get lrc_individual_pool all size: 12 min_size: 7 pg_num: 1024 pgp_num: 1024 crush_rule: lrc_individual_pool hashpspool: true allow_ec_overwrites: false nodelete: false nopgchange: false nosizechange: false write_fadvise_dontneed: false noscrub: false nodeep-scrub: false use_gmt_hitset: 1 erasure_code_profile: lrc_individual_profile fast_read: 0 pg_autoscale_mode: warn pg_num_min: 1024 bulk: true ################# Resulting crush rule: ################# ceph osd crush rule dump lrc_individual_pool { "rule_id": 1, "rule_name": "lrc_individual_pool", "ruleset": 1, "type": 3, "min_size": 3, "max_size": 12, "steps": [ { "op": "set_chooseleaf_tries", "num": 5 }, { "op": "set_choose_tries", "num": 100 }, { "op": "take", "item": -1, "item_name": "default" }, { "op": "choose_indep", "num": 3, "type": "rack" }, { "op": "chooseleaf_indep", "num": 4, "type": "host" }, { "op": "emit" } ] } ############################ Ceph status after the rack outage: ############################ cluster: id: ... health: HEALTH_WARN 96 osds down 4 hosts (96 osds) down 1 rack (96 osds) down Reduced data availability: 1024 pgs inactive, 1024 pgs down services: mon: 3 daemons, quorum ...,...,... (age 4d) mgr: ...(active, since 4d), standbys: ...,... osd: 288 osds: 192 up (since 116s), 288 in (since 21h) data: pools: 2 pools, 1025 pgs objects: 291 objects, 0 B usage: 199 GiB used, 524 TiB / 524 TiB avail pgs: 99.902% pgs not active 1024 down 1 active+clean ################# Section of pg dump: ################# 72.32 0 0 0 0 0 0 0 0 0 0 down 2023-03-08T09:04:02.992141+0100 0'0 140549:52 [NONE,NONE,NONE,NONE,246,116,170,275,112,41,238,40] 246 [NONE,NONE,NONE,NONE,246,116,170,275,112,41,238,40] 246 0'0 2023-03-08T08:48:49.712787+0100 0'0 2023-03-08T08:48:49.712787+0100 0 72.33 0 0 0 0 0 0 0 0 0 0 down 2023-03-08T09:04:02.988083+0100 0'0 140549:134 [36,263,162,73,NONE,NONE,NONE,NONE,99,155,74,282] 36 [36,263,162,73,NONE,NONE,NONE,NONE,99,155,74,282] 36 0'0 2023-03-08T08:48:49.712787+0100 0'0 2023-03-08T08:48:49.712787+0100 0 72.31 0 0 0 0 0 0 0 0 0 0 down 2023-03-08T09:04:06.225241+0100 0'0 140549:136 [116,259,194,170,84,52,98,198,NONE,NONE,NONE,NONE] 116 [116,259,194,170,84,52,98,198,NONE,NONE,NONE,NONE] 116 0'0 2023-03-08T08:48:49.712787+0100 0'0 2023-03-08T08:48:49.712787+0100 0 ############## Single pg query: ############## ceph pg 72.33 query { "snap_trimq": "[]", "snap_trimq_len": 0, "state": "down", "epoch": 140550, "up": [ 36, 263, 162, 73, 2147483647, 2147483647, 2147483647, 2147483647, 99, 155, 74, 282 ], "acting": [ 36, 263, 162, 73, 2147483647, 2147483647, 2147483647, 2147483647, 99, 155, 74, 282 ], "info": { "pgid": "72.33s0", "last_update": "0'0", "last_complete": "0'0", "log_tail": "0'0", "last_user_version": 0, "last_backfill": "MAX", "purged_snaps": [], "history": { "epoch_created": 140477, "epoch_pool_created": 140477, "last_epoch_started": 140516, "last_interval_started": 140515, "last_epoch_clean": 140516, "last_interval_clean": 140515, "last_epoch_split": 0, "last_epoch_marked_full": 0, "same_up_since": 140538, "same_interval_since": 140538, "same_primary_since": 140477, "last_scrub": "0'0", "last_scrub_stamp": "2023-03-08T08:48:49.712787+0100", "last_deep_scrub": "0'0", "last_deep_scrub_stamp": "2023-03-08T08:48:49.712787+0100", "last_clean_scrub_stamp": "2023-03-08T08:48:49.712787+0100", "prior_readable_until_ub": 0 }, "stats": { "version": "0'0", "reported_seq": 135, "reported_epoch": 140550, "state": "down", "last_fresh": "2023-03-08T09:07:15.104685+0100", "last_change": "2023-03-08T09:04:02.988083+0100", "last_active": "2023-03-08T09:04:02.337985+0100", "last_peered": "2023-03-08T09:04:01.288586+0100", "last_clean": "2023-03-08T09:04:01.288586+0100", "last_became_active": "2023-03-08T08:57:08.464085+0100", "last_became_peered": "2023-03-08T08:57:08.464085+0100", "last_unstale": "2023-03-08T09:07:15.104685+0100", "last_undegraded": "2023-03-08T09:07:15.104685+0100", "last_fullsized": "2023-03-08T09:07:15.104685+0100", "mapping_epoch": 140538, "log_start": "0'0", "ondisk_log_start": "0'0", "created": 140477, "last_epoch_clean": 140516, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "0'0", "last_scrub_stamp": "2023-03-08T08:48:49.712787+0100", "last_deep_scrub": "0'0", "last_deep_scrub_stamp": "2023-03-08T08:48:49.712787+0100", "last_clean_scrub_stamp": "2023-03-08T08:48:49.712787+0100", "log_size": 0, "ondisk_log_size": 0, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 0, "num_objects": 0, "num_object_clones": 0, "num_object_copies": 0, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 0, "num_whiteouts": 0, "num_read": 0, "num_read_kb": 0, "num_write": 0, "num_write_kb": 0, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 0, "num_bytes_recovered": 0, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 0 }, "up": [ 36, 263, 162, 73, 2147483647, 2147483647, 2147483647, 2147483647, 99, 155, 74, 282 ], "acting": [ 36, 263, 162, 73, 2147483647, 2147483647, 2147483647, 2147483647, 99, 155, 74, 282 ], "avail_no_missing": [], "object_location_counts": [], "blocked_by": [ 173, 219, 236, 253 ], "up_primary": 36, "acting_primary": 36, "purged_snaps": [] }, "empty": 1, "dne": 0, "incomplete": 0, "last_epoch_started": 140516, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, "peer_info": [], "recovery_state": [ { "name": "Started/Primary/Peering/Down", "enter_time": "2023-03-08T09:04:02.988076+0100", "comment": "not enough up instances of this PG to go active" }, { "name": "Started/Primary/Peering", "enter_time": "2023-03-08T09:04:02.987998+0100", "past_intervals": [ { "first": "140515", "last": "140537", "all_participants": [ { "osd": 36, "shard": 0 }, { "osd": 73, "shard": 3 }, { "osd": 74, "shard": 10 }, { "osd": 99, "shard": 8 }, { "osd": 155, "shard": 9 }, { "osd": 162, "shard": 2 }, { "osd": 173, "shard": 7 }, { "osd": 219, "shard": 6 }, { "osd": 236, "shard": 5 }, { "osd": 253, "shard": 4 }, { "osd": 263, "shard": 1 }, { "osd": 282, "shard": 11 } ], "intervals": [ { "first": "140515", "last": "140536", "acting": "36(0),73(3),74(10),99(8),155(9),162(2),173(7),219(6),236(5),253(4),263(1),282(11)" } ] } ], "probing_osds": [ "36(0)", "73(3)", "74(10)", "99(8)", "155(9)", "162(2)", "263(1)", "282(11)" ], "blocked": "peering is blocked due to down osds", "down_osds_we_would_probe": [ 173, 219, 236, 253 ], "peering_blocked_by": [ { "osd": 173, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" }, { "osd": 219, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" }, { "osd": 236, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" }, { "osd": 253, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" } ] }, { "name": "Started", "enter_time": "2023-03-08T09:04:02.987942+0100" } ], "agent_state": {} }

1 year, 1 month

2
1
0 0

Error deploying Ceph Qunicy using ceph-ansible 7 on Rocky 9

by wodel youchi

Hi, I am trying to deploy Ceph Quincy using ceph-ansible on Rocky9. I am having some problems and I don't know where to search for the reason. PS : I did the same deployment on Rocky8 using ceph-ansible for the Pacific version on the same hardware and it worked perfectly. I have 03 controllers nodes : mon, mgr, mdss and rgws and 27 osd nodes : with 04 nvme disks (osd) each I am using a 10Gb network with jumbo frames. The deployment starts with no issues, the 03 monitors are created correctly, then the 03 managers are created, after that the OSD are prepared and formatted, until here everything is working fine, but when the "*wait for all osd to be up*" task is launched, which means starting all OSDs containers in all OSD nodes, things go south, the monitors become out of quorum, ceph -s takes a lot of time to respond and not all OSDs are being activated, and the deployment fails at the end. cluster 2023-03-06T12:00:26.431947+0100 mon.controllera (mon.0) 3864 : cluster [WRN] [WRN] MON_DOWN: 1/3 mons down, quorum controllera,controllerc cluster 2023-03-06T12:00:26.431953+0100 mon.controllera (mon.0) 3865 : cluster [WRN] mon.controllerb (rank 1) addr [v2: 20.1.0.27:3300/0,v1:20.1.0.27:6789/0] is down (out of quorum) The monitor container in 2 of my controllers nodes stays at 100% of cpu utilization. CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 068e4e55f299 ceph-mon-controllera 99.91% 58.12MiB / 376.1GiB 0.02% 0B / 0B 122MB / 85.3MB 28 <----------------- 87730f89420d ceph-mgr-controllera 0.32% 408.2MiB / 376.1GiB 0.11% 0B / 0B 181MB / 0B 35 Could that be a resource problem? the monitor containers do not have enough resources CPU, RAM, ...etc to handle all the OSDs that are being started? If yes, how may I find this? thanks in advance. Regards.

1 year, 1 month

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2023