August 2023 - ceph-users

EC pool degrades when adding device-class to crush rule

by Lars Fenneberg

Hi all! The cluster was installed before device classes were a thing so as a prepartion to install some SSDs into a Ceph cluster with OSDs on 7 maschines I migrated all replicated pools to a CRUSH rule with device-class set. Lots of misplaced objects (probably because of changed ids in the CRUSH tree) but the cluster was HEALTH_OK the whole time. I'm now testing the migration of a 4+2 erasure coded pool to also include the device-class. Here are the old and new CRUSH rules: # Old rule default.rgw.buckets.data { id 3 type erasure step set_chooseleaf_tries 5 step set_choose_tries 100 step take default step chooseleaf indep 0 type host step emit } # New rule default.rgw.buckets.data_hdd { id 4 type erasure step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step chooseleaf indep 0 type host step emit } After changing the rule with ceph osd pool set default.rgw.buckets.data crush_rule default.rgw.buckets.data_hdd I'm getting quite a few degraded placement groups and I'm not sure if I still have a redundant setup or not during the migration: [root@master1 tmp]# ceph version ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) [root@master1 tmp]# ceph -s cluster: id: 3b46f93c-788a-11e9-bc8c-bcaec503b525 health: HEALTH_WARN Degraded data redundancy: 4903/15474 objects degraded (31.685%), 12 pgs degraded services: mon: 5 daemons, quorum master1.dev,master2.dev,master3.dev,master4.dev,master5.dev (age 10m) mgr: master3.dev(active, since 8m), standbys: master5.dev mds: 1/1 daemons up, 1 standby osd: 14 osds: 14 up (since 8m), 14 in (since 9m) rgw: 4 daemons active (4 hosts, 1 zones) data: volumes: 1/1 healthy pools: 21 pools, 586 pgs objects: 2.70k objects, 365 MiB usage: 281 GiB used, 279 GiB / 560 GiB avail pgs: 4903/15474 objects degraded (31.685%) 7054/15474 objects misplaced (45.586%) 554 active+clean 20 active+recovering 12 active+recovery_wait+degraded io: recovery: 545 KiB/s, 18 objects/s [root@master1 tmp]# ceph health detail HEALTH_WARN Degraded data redundancy: 4903/15474 objects degraded (31.685%), 12 pgs degraded [WRN] PG_DEGRADED: Degraded data redundancy: 4903/15474 objects degraded (31.685%), 12 pgs degraded pg 19.0 is active+recovery_wait+degraded, acting [9,2,3,13,11,5] pg 19.2 is active+recovery_wait+degraded, acting [2,6,0,11,7,12] pg 19.5 is active+recovery_wait+degraded, acting [5,1,0,11,4,12] pg 19.6 is active+recovery_wait+degraded, acting [1,6,3,8,7,11] pg 19.7 is active+recovery_wait+degraded, acting [13,6,10,3,7,8] pg 19.b is active+recovery_wait+degraded, acting [4,1,13,6,11,9] pg 19.c is active+recovery_wait+degraded, acting [8,6,11,7,13,2] pg 19.e is active+recovery_wait+degraded, acting [6,8,0,11,12,4] pg 19.f is active+recovery_wait+degraded, acting [5,10,13,1,0,4] pg 19.10 is active+recovery_wait+degraded, acting [3,11,5,9,7,1] pg 19.1a is active+recovery_wait+degraded, acting [7,0,8,1,11,5] pg 19.1f is active+recovery_wait+degraded, acting [7,5,8,3,1,11] [root@master1 tmp]# ceph osd pool list detail | grep default.rgw.buckets.data pool 19 'default.rgw.buckets.data' erasure profile default.rgw.buckets.data size 6 min_size 5 crush_rule 4 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode off last_change 174 flags hashpspool stripe_width 16384 pg_num_min 32 target_size_ratio 69.8 application rgw The output of ceph health detail looks to me as if all the six chunks of the erasure coded data are still there as expected. But why are these placement groups shown as degraded? I'd expect only misplaced object chunks in this scenario the same as with replicated pools. Is there actually any reduction in redundancy or not? The cluster recovers after some time. I've also tested with more (256) placement groups but the outcome is the same. Thanks, LF. -- Lars Fenneberg, lf(a)elemental.net

9 months

2
2
0 0

Global recovery event but HEALTH_OK

by Alfredo Daniel Rezinovsky

I had many movement in my cluster. Broken node, replacement, rebalancing. Noy I'm stuck in upgrade to 18.2.0 (mgr and mon upgraded) and the cluster is in "Global Recovery Event" The health is OK I don't know how to search for the problem

9 months

2
1
0 0

Upcoming change to fix "ceph config dump" output inconsistency.

by Sridhar Seshasayee

Hello Everyone, Recently, an issue related to inconsistency in the output of "ceph config dump" command was reported. The inconsistency is between the normal (non-pretty-print) and pretty-print outputs. The non-pretty print output displays the localized option name whereas the pretty-print output displays the normalized option name. For e.g., Normalized: mgr/dashboard/ssl_server_port Localized: mgr/dashboard/x/ssl_server_port The fix ensures that the localized option name is shown in all cases. The issue is tracked in https://tracker.ceph.com/issues/62379 and the fix is not yet merged. This is to give a heads up in case you have any kind of automation that relies on the pretty-printed output (json, xml). This fix would soon be made available in upstream and downstream branches. If you have any concerns around this change, please let us know. Thanks, -Sridhar

9 months

1
0
0 0

Debian/bullseye build for reef

by Chris Palmer

I'd like to try reef, but we are on debian 11 (bullseye). In the ceph repos, there is debian-quincy/bullseye and debian-quincy/focal, but under reef there is only focal & jammy. Is there a reason why there is no reef/bullseye build? I had thought that the blocker only affected debian-bookworm builds. Thanks, Chris

9 months

3
4
0 0

osd: why not use aio in read?

by Xinying Song

Hi, guys: I'm using ceph 14 on HDD, and observed obvious high latency for pg.lock(). Further inspection shows the root cause seems to be the function pgbackend->objects_read_sync() called in PrimaryLogPG::do_read() which will hold the pg lock until the disk read finish. My question is why not use aio for read like what we are doing for write in bluestore? Is there any known problem for aio read in osd? Thanks in advance, Xinying Song

9 months

1
0
0 0

OSD delete vs destroy vs purge

by Nicola Mori

Dear Ceph users, I see that the OSD page of the Ceph dashboard offers three possibilities for "removing" an OSD: delete, destroy and purge. The delete operation has the possibility to flag the "Preserve OSD ID(s) for replacement." option. I searched for explanations of the differences between the three commands but I didn't find anything definitive, so I'd need some help with this. Thanks in advance, Nicola

9 months

3
4
0 0

Decrepit ceph cluster performance

by J David

I have a cluster that is performing *very* poorly. It has 12 physical servers and 72 400GB SATA Intel DC mixed-use SSD OSDs (S3700's & S3710's). All servers have bonded 10GB NICs for 20Gbps aggregate networking to a dedicated switch. At least one CPU core and 8GB RAM per OSD. No SMART errors. All SSDs have media wearout indicators >90% remaining. The cluster is Quincy 17.2.6 under cephadm. (It was recently upgraded from Octopus.) Its primary use is serving RBD VM block devices for Proxmox, but it is a standalone cluster. (The RBD bench above was run from one of the mon servers, so Proxmox is not at all involved here.) The rados bench for this is... problematic: gridfury@f13:~$ rados bench -p docker 60 write --no-cleanup hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects Object prefix: benchmark_data_f13_857310 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 220 204 815.924 816 0.0339855 0.0617994 2 16 252 236 471.942 128 0.0360448 0.0584063 3 16 431 415 553.261 716 0.0544013 0.11399 4 16 533 517 516.93 408 0.0536192 0.103342 5 16 553 537 429.541 80 0.0465437 0.120551 6 16 675 659 439.271 488 0.0390309 0.13919 7 16 704 688 393.087 116 0.0344105 0.134758 8 16 935 919 459.433 924 0.0443338 0.138301 9 16 1167 1151 511.48 928 0.0407041 0.124401 10 16 1219 1203 481.129 208 0.0408885 0.121026 11 16 1219 1203 437.389 0 - 0.121026 12 16 1333 1317 438.934 228 0.0436965 0.141108 13 16 1333 1317 405.169 0 - 0.141108 14 16 1333 1317 376.228 0 - 0.141108 15 16 1333 1317 351.146 0 - 0.141108 16 16 1333 1317 329.199 0 - 0.141108 17 16 1383 1367 321.597 40 0.0738412 0.198283 18 16 1621 1605 356.612 952 0.0396665 0.179042 19 16 1880 1864 392.361 1036 0.0349097 0.162829 2023-08-14T01:27:04.758111+0000 min lat: 0.0238844 max lat: 5.29731 avg lat: 0.152446 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 20 16 2111 2095 418.936 924 0.0581478 0.152446 21 16 2352 2336 444.881 964 0.0639009 0.143467 22 16 2592 2576 468.288 960 0.0921614 0.136478 23 16 2828 2812 488.965 944 0.0587983 0.130606 24 16 3050 3034 505.586 888 0.106586 0.126353 25 16 3167 3151 504.08 468 0.0606541 0.126778 26 16 3408 3392 521.763 964 0.0356258 0.12252 27 16 3644 3628 537.395 944 0.045652 0.118972 28 16 3844 3828 546.769 800 0.0328637 0.116728 29 16 3870 3854 531.501 104 0.0418716 0.11621 30 16 3870 3854 513.784 0 - 0.11621 31 16 3870 3854 497.21 0 - 0.11621 32 16 3870 3854 481.673 0 - 0.11621 33 16 3870 3854 467.076 0 - 0.11621 34 16 3870 3854 453.339 0 - 0.11621 35 16 3870 3854 440.386 0 - 0.11621 36 16 3870 3854 428.152 0 - 0.11621 37 16 3870 3854 416.581 0 - 0.11621 38 16 3870 3854 405.618 0 - 0.11621 39 16 3870 3854 395.218 0 - 0.11621 2023-08-14T01:27:24.761559+0000 min lat: 0.0238844 max lat: 5.29731 avg lat: 0.11621 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 40 16 3870 3854 385.337 0 - 0.11621 41 16 3870 3854 375.938 0 - 0.11621 42 16 3870 3854 366.987 0 - 0.11621 43 16 3870 3854 358.452 0 - 0.11621 44 16 3870 3854 350.306 0 - 0.11621 45 16 3870 3854 342.521 0 - 0.11621 46 16 3870 3854 335.075 0 - 0.11621 47 16 3870 3854 327.946 0 - 0.11621 48 16 3870 3854 321.114 0 - 0.11621 49 16 3870 3854 314.56 0 - 0.11621 50 16 3870 3854 308.269 0 - 0.11621 51 16 3870 3854 302.224 0 - 0.11621 52 16 3870 3854 296.412 0 - 0.11621 53 16 3870 3854 290.82 0 - 0.11621 54 16 3870 3854 285.434 0 - 0.11621 55 16 3870 3854 280.244 0 - 0.11621 56 16 3870 3854 275.24 0 - 0.11621 57 16 3870 3854 270.411 0 - 0.11621 58 16 3870 3854 265.749 0 - 0.11621 59 16 3870 3854 261.245 0 - 0.11621 2023-08-14T01:27:44.765065+0000 min lat: 0.0238844 max lat: 5.29731 avg lat: 0.11621 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 60 16 3870 3854 256.89 0 - 0.11621 61 16 3870 3854 252.679 0 - 0.11621 62 16 3870 3854 248.603 0 - 0.11621 63 16 3870 3854 244.657 0 - 0.11621 64 16 3870 3854 240.835 0 - 0.11621 65 16 3870 3854 237.129 0 - 0.11621 Total time run: 65.6256 Total writes made: 3870 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 235.884 Stddev Bandwidth: 375.227 Max bandwidth (MB/sec): 1036 Min bandwidth (MB/sec): 0 Average IOPS: 58 Stddev IOPS: 93.8068 Max IOPS: 259 Min IOPS: 0 Average Latency(s): 0.271253 Stddev Latency(s): 2.43809 Max latency(s): 37.7948 Min latency(s): 0.0238844 58 IOPs average is... less than I might hope for from this hardware. But those long strings of 0's... I feel like 0 IOPS is inarguably too low. At the time I ran this, I had ceph status running and saw +laggy pgs and actually caught a message: 36 slow ops, oldest one blocked for 37 sec, daemons [osd.10,osd.12,osd.13,osd.14,osd.15,osd.17,osd.2,osd.25,osd.28,osd.3]... So I checked a few osd logs; there is nothing during the test, but several have stuff immediately after like this: Aug 14 01:27:49 f13 ceph-osd[651856]: osd.2 pg_epoch: 110829 pg[2.591( v 110828'143791701 (110828'143791003,110828'143791701] local-lis/les=107797/107798 n=826 ec=1/1 lis/c=107797/107797 les/c/f=107798/107798/43927 sis=110829) [2] r=0 lpr=110829 pi=[10779 7,110829)/1 luod=0'0 lua=107760'143777660 crt=110828'143791701 lcod 110828'143791700 mlcod 0'0 active mbc={}] start_peering_interval up [48,2] -> [2], acting [48,2] -> [2], acting_primary 48 -> 2, up_primary 48 -> 2, role 1 -> 0, features acting 454013832 0759226367 upacting 4540138320759226367 Aug 14 01:27:49 f13 ceph-osd[651856]: osd.2 pg_epoch: 110829 pg[2.591( v 110828'143791701 (110828'143791003,110828'143791701] local-lis/les=107797/107798 n=826 ec=1/1 lis/c=107797/107797 les/c/f=107798/107798/43927 sis=110829) [2] r=0 lpr=110829 pi=[10779 7,110829)/1 crt=110828'143791701 lcod 110828'143791700 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary Aug 14 01:27:49 f13 ceph-osd[651856]: osd.2 pg_epoch: 110829 pg[5.b( v 98878'16 (0'0,98878'16] local-lis/les=107797/107798 n=4 ec=98376/98376 lis/c=107797/107797 les/c/f=107798/107798/0 sis=110829) [2,14] r=0 lpr=110829 pi=[107797,110829)/1 crt=98878'16 l cod 0'0 mlcod 0'0 active mbc={}] start_peering_interval up [2,48,14] -> [2,14], acting [2,48,14] -> [2,14], acting_primary 2 -> 2, up_primary 2 -> 2, role 0 -> 0, features acting 4540138320759226367 upacting 4540138320759226367 Aug 14 01:27:49 f13 ceph-osd[651856]: osd.2 pg_epoch: 110829 pg[5.b( v 98878'16 (0'0,98878'16] local-lis/les=107797/107798 n=4 ec=98376/98376 lis/c=107797/107797 les/c/f=107798/107798/0 sis=110829) [2,14] r=0 lpr=110829 pi=[107797,110829)/1 crt=98878'16 l cod 0'0 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary Aug 14 01:27:49 f13 ceph-osd[651856]: osd.2 pg_epoch: 110829 pg[2.30c( v 110828'58472450 (107939'58468576,110828'58472450] local-lis/les=107797/107798 n=840 ec=1/1 lis/c=107797/107797 les/c/f=107798/107800/43927 sis=110829) [2] r=0 lpr=110829 pi=[107797,1 10829)/1 luod=110828'58472444 crt=110828'58472450 lcod 110828'58472443 mlcod 110828'58472443 active mbc={255={}}] start_peering_interval up [2,48] -> [2], acting [2,48] -> [2], acting_primary 2 -> 2, up_primary 2 -> 2, role 0 -> 0, features acting 4540138 320759226367 upacting 4540138320759226367 Aug 14 01:27:49 f13 ceph-osd[651856]: osd.2 pg_epoch: 110829 pg[2.326( v 110828'84969926 (110432'84966024,110828'84969926] local-lis/les=107797/107798 n=831 ec=1/1 lis/c=107797/107797 les/c/f=107798/107798/43927 sis=110829) [2] r=0 lpr=110829 pi=[107797,1 10829)/1 luod=110828'84969917 crt=110828'84969926 lcod 110828'84969916 mlcod 110828'84969916 active mbc={255={}}] start_peering_interval up [2,48] -> [2], acting [2,48] -> [2], acting_primary 2 -> 2, up_primary 2 -> 2, role 0 -> 0, features acting 4540138 320759226367 upacting 4540138320759226367 Aug 14 01:27:49 f13 ceph-osd[651856]: osd.2 pg_epoch: 110829 pg[2.30c( v 110828'58472450 (107939'58468576,110828'58472450] local-lis/les=107797/107798 n=840 ec=1/1 lis/c=107797/107797 les/c/f=107798/107800/43927 sis=110829) [2] r=0 lpr=110829 pi=[107797,1 10829)/1 crt=110828'58472450 lcod 110828'58472443 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary Aug 14 01:27:49 f13 ceph-osd[651856]: osd.2 pg_epoch: 110829 pg[2.326( v 110828'84969926 (110432'84966024,110828'84969926] local-lis/les=107797/107798 n=831 ec=1/1 lis/c=107797/107797 les/c/f=107798/107798/43927 sis=110829) [2] r=0 lpr=110829 pi=[107797,1 10829)/1 crt=110828'84969926 lcod 110828'84969916 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary Aug 14 01:27:57 f13 ceph-osd[651856]: --2- [v2:192.168.2.213:6804/3119259815,v1:192.168.2.213:6805/3119259815] >> [v2:192.168.2.225:6808/1596776081,v1:192.168.2.225:6809/1596776081] conn(0x55c3dc924400 0x55c3b8555b80 crc :-1 s=SESSION_ACCEPTING pgs=167 cs =0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client Aug 14 01:27:58 f13 ceph-osd[651856]: osd.2 pg_epoch: 110833 pg[2.591( v 110828'143791701 (110828'143791003,110828'143791701] local-lis/les=110829/110830 n=826 ec=1/1 lis/c=110829/107797 les/c/f=110830/107798/43927 sis=110833 pruub=15.513665199s) [48,2] r =1 lpr=110833 pi=[107797,110833)/1 crt=110828'143791701 lcod 110828'143791700 mlcod 0'0 active pruub 22348.017578125s@ mbc={}] start_peering_interval up [2] -> [48,2], acting [2] -> [48,2], acting_primary 2 -> 48, up_primary 2 -> 48, role 0 -> 1, features acting 4540138320759226367 upacting 4540138320759226367 Aug 14 01:27:58 f13 ceph-osd[651856]: osd.2 pg_epoch: 110833 pg[2.591( v 110828'143791701 (110828'143791003,110828'143791701] local-lis/les=110829/110830 n=826 ec=1/1 lis/c=110829/107797 les/c/f=110830/107798/43927 sis=110833 pruub=15.513574600s) [48,2] r =1 lpr=110833 pi=[107797,110833)/1 crt=110828'143791701 lcod 110828'143791700 mlcod 0'0 unknown NOTIFY pruub 22348.017578125s@ mbc={}] state<Start>: transitioning to Stray Aug 14 01:27:58 f13 ceph-osd[651856]: osd.2 pg_epoch: 110833 pg[5.b( v 98878'16 (0'0,98878'16] local-lis/les=110829/110830 n=4 ec=98376/98376 lis/c=110829/107797 les/c/f=110830/107798/0 sis=110833 pruub=15.519681931s) [2,48,14] r=0 lpr=110833 pi=[107797,1 10833)/1 crt=98878'16 lcod 0'0 mlcod 0'0 active pruub 22348.023437500s@ mbc={}] start_peering_interval up [2,14] -> [2,48,14], acting [2,14] -> [2,48,14], acting_primary 2 -> 2, up_primary 2 -> 2, role 0 -> 0, features acting 4540138320759226367 upacting 4540138320759226367 Aug 14 01:27:58 f13 ceph-osd[651856]: osd.2 pg_epoch: 110833 pg[5.b( v 98878'16 (0'0,98878'16] local-lis/les=110829/110830 n=4 ec=98376/98376 lis/c=110829/107797 les/c/f=110830/107798/0 sis=110833 pruub=15.519681931s) [2,48,14] r=0 lpr=110833 pi=[107797,1 10833)/1 crt=98878'16 lcod 0'0 mlcod 0'0 unknown pruub 22348.023437500s@ mbc={}] state<Start>: transitioning to Primary Aug 14 01:27:58 f13 ceph-osd[651856]: osd.2 pg_epoch: 110833 pg[2.326( v 110828'84969926 (110432'84966024,110828'84969926] local-lis/les=110829/110830 n=831 ec=1/1 lis/c=110829/107797 les/c/f=110830/107798/43927 sis=110833 pruub=15.512885094s) [2,48] r=0 lpr=110833 pi=[107797,110833)/1 crt=110828'84969926 lcod 110828'84969916 mlcod 0'0 active pruub 22348.017578125s@ mbc={}] start_peering_interval up [2] -> [2,48], acting [2] -> [2,48], acting_primary 2 -> 2, up_primary 2 -> 2, role 0 -> 0, features acting 4540138320759226367 upacting 4540138320759226367 Aug 14 01:27:58 f13 ceph-osd[651856]: osd.2 pg_epoch: 110833 pg[2.326( v 110828'84969926 (110432'84966024,110828'84969926] local-lis/les=110829/110830 n=831 ec=1/1 lis/c=110829/107797 les/c/f=110830/107798/43927 sis=110833 pruub=15.512885094s) [2,48] r=0 lpr=110833 pi=[107797,110833)/1 crt=110828'84969926 lcod 110828'84969916 mlcod 0'0 unknown pruub 22348.017578125s@ mbc={}] state<Start>: transitioning to Primary Aug 14 01:27:58 f13 ceph-osd[651856]: osd.2 pg_epoch: 110833 pg[2.30c( v 110828'58472450 (107939'58468576,110828'58472450] local-lis/les=110829/110830 n=840 ec=1/1 lis/c=110829/107797 les/c/f=110830/107800/43927 sis=110833 pruub=15.512117386s) [2,48] r=0 lpr=110833 pi=[107797,110833)/1 crt=110828'58472450 lcod 110828'58472443 mlcod 0'0 active pruub 22348.017578125s@ mbc={}] start_peering_interval up [2] -> [2,48], acting [2] -> [2,48], acting_primary 2 -> 2, up_primary 2 -> 2, role 0 -> 0, features acting 4540138320759226367 upacting 4540138320759226367 Aug 14 01:27:58 f13 ceph-osd[651856]: osd.2 pg_epoch: 110833 pg[2.30c( v 110828'58472450 (107939'58468576,110828'58472450] local-lis/les=110829/110830 n=840 ec=1/1 lis/c=110829/107797 les/c/f=110830/107800/43927 sis=110833 pruub=15.512117386s) [2,48] r=0 lpr=110833 pi=[107797,110833)/1 crt=110828'58472450 lcod 110828'58472443 mlcod 0'0 unknown pruub 22348.017578125s@ mbc={}] state<Start>: transitioning to Primary I also captured iostat -x 5 from several of the OSDs during the test, including one of the ones identified as having blocked IO. It shows 3-15% utilized with 30-200 IOPS (mostly write). r_await average 0.22, w_await average 0.46. Nothing in dmesg or syslog. Is there anything to be done to further investigate or remediate this? This cluster is pretty old and I'm starting to think we should just scrap it as hopeless. But that's a lot of hardware to write off if there's any possible way to fix this. Thanks for any advice!

9 months

5
10
0 0

v18.2.0 Reef released

by Yuri Weinstein

We're very happy to announce the first stable release of the Reef series. We express our gratitude to all members of the Ceph community who contributed by proposing pull requests, testing this release, providing feedback, and offering valuable suggestions. Major Changes from Quincy: - RADOS: RocksDB has been upgraded to version 7.9.2. - RADOS: There have been significant improvements to RocksDB iteration overhead and performance. - RADOS: The perf dump and perf schema commands have been deprecated in favor of the new counter dump and counter schema commands. - RADOS: Cache tiering is now deprecated. - RADOS: A new feature, the "read balancer", is now available, which allows users to balance primary PGs per pool on their clusters. - RGW: Bucket resharding is now supported for multi-site configurations. - RGW: There have been significant improvements to the stability and consistency of multi-site replication. - RGW: Compression is now supported for objects uploaded with Server-Side Encryption. - Dashboard: There is a new Dashboard page with improved layout. Active alerts and some important charts are now displayed inside cards. - RBD: Support for layered client-side encryption has been added. - Telemetry: Users can now opt in to participate in a leaderboard in the telemetry public dashboards. We encourage you to read the full release notes at https://ceph.io/en/news/blog/2023/v18-2-0-reef-released/ Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at https://download.ceph.com/tarballs/ceph-18.2.0.tar.gz * Containers at https://quay.io/repository/ceph/ceph * For packages, see https://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 5dd24139a1eada541a3bc16b6941c5dde975e26d Did you know? Every Ceph release is built and tested on resources funded directly by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now https://ceph.io/en/foundation/.

9 months

4
3
1 0

radosgw-admin sync error trim seems to do nothing

by Matthew Darwin

Hello all, "radosgw-admin sync error list" returns errors from 2022. I want to clear those out. I tried "radosgw-admin sync error trim" but it seems to do nothing. The man page seems to offer no suggestions https://docs.ceph.com/en/quincy/man/8/radosgw-admin/ Any ideas what I need to do to remove old errors? (or at least I want to see more recent errors) ceph version 17.2.6 (quincy) Thanks.

9 months

1
0
0 0

Degraded FS on 18.2.0 - two monitors per host????

by Robert W. Eckert

Hi - I have a 4 node cluster, and started to have some odd access issues to my file system "Home" When I started investigating, saw the message "1 MDSs behind on trimming", but I also noticed that I seem to have 2 MDSs running on each server - 3 Daemons up, with 5 standby. Is this expected behavior after the upgrade to 18.2? or did something go wrong? [root@cube ~]# ceph status cluster: id: fe3a7cb0-69ca-11eb-8d45-c86000d08867 health: HEALTH_WARN 1 filesystem is degraded 1 MDSs behind on trimming services: mon: 3 daemons, quorum rhel1,cube,hiho (age 23m) mgr: hiho.bphqff(active, since 23m), standbys: rhel1.owrvaz, cube.sdhftu mds: 3/3 daemons up, 5 standby osd: 16 osds: 16 up (since 23m), 16 in (since 26h) rgw: 4 daemons active (4 hosts, 1 zones) data: volumes: 0/1 healthy, 1 recovering pools: 12 pools, 769 pgs objects: 3.64M objects, 3.1 TiB usage: 17 TiB used, 49 TiB / 65 TiB avail pgs: 765 active+clean 4 active+clean+scrubbing+deep io: client: 154 MiB/s rd, 38 op/s rd, 0 op/s wr [root@cube ~]# ceph health detail HEALTH_WARN 1 filesystem is degraded; 1 MDSs behind on trimming [WRN] FS_DEGRADED: 1 filesystem is degraded fs home is degraded [WRN] MDS_TRIM: 1 MDSs behind on trimming mds.home.story.sodtjs(mds.0): Behind on trimming (5546/128) max_segments: 128, num_segments: 5546 [root@cube ~]# ceph fs status home home - 10 clients ==== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 replay home.story.sodtjs<http://home.story.sodtjs> 802k 766k 36.7k 0 1 resolve home.cube.xljmfz<http://home.cube.xljmfz> 735k 680k 39.0k 0 2 resolve home.rhel1.nwpmbg<http://home.rhel1.nwpmbg> 322k 316k 17.5k 0 POOL TYPE USED AVAIL home.meta metadata 361G 14.9T home.data data 9206G 14.9T STANDBY MDS home.rhel1.ffrufi<http://home.rhel1.ffrufi> home.hiho.mssdyh<http://home.hiho.mssdyh> home.cube.kmpbku<http://home.cube.kmpbku> home.hiho.cfuswn<http://home.hiho.cfuswn> home.story.gmieio<http://home.story.gmieio> MDS version: ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable)

9 months

2
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users August 2023