March 2023 - ceph-users - lists.ceph.io

Bluestore RocksDB Compression how to set

by Feng, Hualong

Hi all I want to study the effect of bluestore rocksdb compression on ceph and whether it is necessary to optimize it. But currently, bluestore rocksdb compression is disabled by default in ceph. I simply replaced the rocksdb compression algorithm, and then performed a 4KB rand read fio test, but the test results are not different. So I would like to ask you, is there any scenario where rocksdb needs to be compressed in ceph, and if so, what type of test is used to better demonstrate the impact of rocksdb compression on the upper layer of ceph. // create 10GB image ceph osd pool create rbd 128 rbd pool init rbd rbd create --size 10240 --image rbd/image --image-format 2 --thick-provision //fio [global] ioengine=rbd iodepth=128 rw=randwrite bs=4KB time_based=1 ramp_time=60s runtime=300s clientname=admin pool=rbd group_reporting buffer_compress_percentage=80 refill_buffers buffer_pattern=0xdeadbeef [volumes] rbdname=image numjobs=1 Thanks -Hualong

1 year, 2 months

1
0
0 0

Ganesha NFS: Files disappearing

by Patrick Schlangen

Hi, today I saw a strange situation where files which were copied to a cephfs via Ganesha NFS (deployed via cephadm) disappeared from the NFS directory and then did not show up anymore until I restarted the ganesha instance. This could be observed on different NFS client hosts. While the files were not showing, they could still be seen via cephfs-shell, so I assume the issue is somewhere on the NFS side. Interestingly, the folder modification time (shown in ls) matched the time the file was copied to that folder. Any idea what could be the issue here? Unfortunately there were no interesting logs and I couldn't find a way yet to reproduce this. The ceph version in use is 17.2.5. Thanks, Patrick

1 year, 2 months

2
2
0 0

10x more used space than expected

by Gaël THEROND

Hi everyone, I’ve got a quick question regarding one of our RadosGW bucket. This bucket is used to store docker registries, and the total amount of data we use is supposed to be 4.5Tb BUT it looks like ceph told us we rather use ~53Tb of data. One interesting thing is, this bucket seems to shard for unknown reason as it is supposed to be disabled by default, but even taking that into account we’re not supposed to see such a massive amount of additional data isn’t it? Here is the bucket stats of it: https://paste.opendev.org/show/bdWFRvNFtxyHnbPfXWu9/

1 year, 2 months

4
8
0 0

Stuck OSD service specification - can't remove

by David Orman

Has anybody run into a 'stuck' OSD service specification? I've tried to delete it, but it's stuck in 'deleting' state, and has been for quite some time (even prior to upgrade, on 15.2.x). This is on 16.2.3: NAME PORTS RUNNING REFRESHED AGE PLACEMENT osd.osd_spec 504/525 <deleting> 12m label:osd root@ceph01:/# ceph orch rm osd.osd_spec Removed service osd.osd_spec From active monitor: debug 2021-05-06T23:14:48.909+0000 7f17d310b700 0 log_channel(cephadm) log [INF] : Remove service osd.osd_spec Yet in ls, it's still there, same as above. --export on it: root@ceph01:/# ceph orch ls osd.osd_spec --export service_type: osd service_id: osd_spec service_name: osd.osd_spec placement: {} unmanaged: true spec: filter_logic: AND objectstore: bluestore We've tried --force, as well, with no luck. To be clear, the --export even prior to delete looks nothing like the actual service specification we're using, even after I re-apply it, so something seems 'bugged'. Here's the OSD specification we're applying: service_type: osd service_id: osd_spec placement: label: "osd" data_devices: rotational: 1 db_devices: rotational: 0 db_slots: 12 I would appreciate any insight into how to clear this up (without removing the actual OSDs, we're just wanting to apply the updated service specification - we used to use host placement rules and are switching to label-based). Thanks, David

1 year, 2 months

3
4
0 0

Last day to sponsor Cephalocon Amsterdam 2023

by Mike Perez

Hi everyone, Today is the last day to sponsor Cephalocon Amsterdam 2023! I want to thank our current sponsors: Platinum: IBM Silver: 42on, Canonical Ubuntu, Clyso Startup: Koor Also, thank you to Clyso for their lanyard add-on and 42on's offsite attendee party. We are still short in covering the costs for the event, so I'm asking for contributors and members of the Ceph Foundation to consider applying today. https://events.linuxfoundation.org/cephalocon/sponsor/ Sponsor Prospectus: https://events.linuxfoundation.org/wp-content/uploads/2023/03/sponsor-ceph-… Please get in touch with us at sponsorships(a)ceph.foundation to get started. Thank you! -- Mike Perez

1 year, 2 months

1
0
0 0

pg wait too long when osd restart

by yite gu

Hi all, osd_heartbeat_grace = 20 and osd_pool_default_read_lease_ratio = 0.8 by default, so, pg will wait 16s when osd restart in the worst case. This wait time is too long, client i/o can not be unacceptable. I think adjusting the osd_pool_default_read_lease_ratio to lower is a good way. Have any good suggestions about reduce pg wait time？ Best Regard Yite Gu

1 year, 2 months

2
4
0 0

rbd on EC pool with fast and extremely slow writes/reads

by Andrej Filipcic

Hi, I have a problem on one of ceph clusters I do not understand. ceph 17.2.5 on 17 servers, 400 HDD OSDs, 10 and 25Gb/s NICs 3TB rbd image is on erasure coded 8+3 pool with 128pgs , xfs filesystem, 4MB objects in rbd image, mostly empy. I have created a bunch of 10G files, most of them were written with 1.5GB/s, few of them were really slow, ~10MB/s, a factor of 100. When reading these files back, the fast-written ones are read fast, ~2-2.5GB/s, the slowly-written are also extremely slow in reading, iotop shows between 1 and 30 MB/s reading speed. This does not happen at all on replicated images. There are some OSDs with higher apply/commit latency, eg 200ms, but there are no slow ops. The tests were done actually on proxmox vm with librbd, but the same happens with krbd, and on bare metal with mounted krbd as well. I have tried to check all OSDs for laggy drives, but they all look about the same. I have also copied entire image with "rados get...", object by object, the strange thing here is that most of objects were copied within 0.1-0.2s, but quite some took more than 1s. The cluster is quite busy with base traffic of ~1-2GB/s, so the speeds can vary due to that. But I would not expect a factor of 100 slowdown for some writes/reads with rbds. Any clues on what might be wrong or what else to check? I have another similar ceph cluster where everything looks fine. Best, Andrej -- _____________________________________________________________ prof. dr. Andrej Filipcic, E-mail: Andrej.Filipcic(a)ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674 Fax: +386-1-477-3166 -------------------------------------------------------------

1 year, 2 months

3
3
0 0

handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted)

by Arvid Picciani

since quincy i'm randomly getting authentication issues from clients to osds. symptom is qemu hangs, but when it happens, i can reproduce it using: > ceph tell osd.\* version some - but only some - osds will never respond, but only to clients on _some_ hosts. the client gets stuck in a loop with this error > 2023-03-14T10:09:38.492+0100 7f38f5d95700 1 --2- 10.180.10.36:0/329477069 >> [v2:10.180.10.24:6810/697584,v1:10.180.10.24:6811/697584] conn(0x7f38f0107990 0x7f38f0107d60 crc :-1 s=SESSION_CONNECTING pgs=0 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted) restarting the affected OSD helps for a few hours. in the osd log i see only > 2023-03-14T09:27:27.801+0000 7fb79020a700 10 osd.4 114909 ms_handle_authentication session 0x55880cd58b40 client.admin h as caps osdcap[grant(*)] 'allow *' > 2023-03-14T09:27:27.805+0000 7fb781a3c700 2 osd.4 114909 ms_handle_reset con 0x55880a7fec00 session 0x55880cd58b40 searching for this issue gives me people whos mon is dead, but i dont think "tell" is supposed to go through mon, beyond the initial listing, which succeeds. but here's the full auth log from mon anyway if it helps: 2023-03-14T09:34:48.847+0000 7fcc8a5c7700 10 In get_auth_session_handler for protocol 0 2023-03-14T09:34:48.847+0000 7fcc84dbc700 10 start_session entity_name=client.admin global_id=6751719 is_new_global_id=1 2023-03-14T09:34:48.847+0000 7fcc84dbc700 10 cephx server client.admin: start_session server_challenge 20aa2b96857f41cf 2023-03-14T09:34:48.847+0000 7fcc865bf700 10 start_session entity_name=client.admin global_id=6751722 is_new_global_id=1 2023-03-14T09:34:48.847+0000 7fcc865bf700 10 cephx server client.admin: start_session server_challenge 6066dd1200ddc855 2023-03-14T09:34:48.847+0000 7fcc84dbc700 10 cephx server client.admin: handle_request get_auth_session_key for client.admin 2023-03-14T09:34:48.847+0000 7fcc84dbc700 20 cephx server client.admin: checking key: req.key=92ed7ea281e9ac0c expected_key=92ed7ea281e9ac0c 2023-03-14T09:34:48.847+0000 7fcc84dbc700 20 cephx server client.admin: checking old_ticket: secret_id=0 len=0, old_ticket_may_be_omitted=0 2023-03-14T09:34:48.847+0000 7fcc84dbc700 10 cephx server client.admin: new global_id 6751719 2023-03-14T09:34:48.847+0000 7fcc84dbc700 10 cephx: build_service_ticket_reply encoding 1 tickets with secret REDACTED== 2023-03-14T09:34:48.847+0000 7fcc84dbc700 10 cephx: build_service_ticket service auth secret_id 160 ticket_info.ticket.name=client.admin ticket.global_id 6751719 2023-03-14T09:34:48.847+0000 7fcc84dbc700 10 cephx keyserverdata: get_caps: name=client.admin 2023-03-14T09:34:48.847+0000 7fcc84dbc700 10 cephx keyserverdata: get_secret: num of caps=4 2023-03-14T09:34:48.847+0000 7fcc865bf700 10 cephx server client.admin: handle_request get_auth_session_key for client.admin 2023-03-14T09:34:48.847+0000 7fcc865bf700 20 cephx server client.admin: checking key: req.key=3c1f6182caf84073 expected_key=3c1f6182caf84073 2023-03-14T09:34:48.847+0000 7fcc865bf700 20 cephx server client.admin: checking old_ticket: secret_id=0 len=0, old_ticket_may_be_omitted=0 2023-03-14T09:34:48.847+0000 7fcc865bf700 10 cephx server client.admin: new global_id 6751722 2023-03-14T09:34:48.847+0000 7fcc865bf700 10 cephx: build_service_ticket_reply encoding 1 tickets with secret REDACTED== 2023-03-14T09:34:48.847+0000 7fcc865bf700 10 cephx: build_service_ticket service auth secret_id 160 ticket_info.ticket.name=client.admin ticket.global_id 6751722 2023-03-14T09:34:48.847+0000 7fcc865bf700 10 cephx keyserverdata: get_caps: name=client.admin 2023-03-14T09:34:48.847+0000 7fcc865bf700 10 cephx keyserverdata: get_secret: num of caps=4 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 start_session entity_name=client.admin global_id=6751725 is_new_global_id=1 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx server client.admin: start_session server_challenge 22fa068f8da1fb28 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx server client.admin: handle_request get_auth_session_key for client.admin 2023-03-14T09:34:48.851+0000 7fcc84dbc700 20 cephx server client.admin: checking key: req.key=fc7fdedb8e669347 expected_key=fc7fdedb8e669347 2023-03-14T09:34:48.851+0000 7fcc84dbc700 20 cephx server client.admin: checking old_ticket: secret_id=0 len=0, old_ticket_may_be_omitted=0 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx server client.admin: new global_id 6751725 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx: build_service_ticket_reply encoding 1 tickets with secret REDACTED== 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx: build_service_ticket service auth secret_id 160 ticket_info.ticket.name=client.admin ticket.global_id 6751725 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx keyserverdata: get_caps: name=client.admin 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx keyserverdata: get_secret: num of caps=4 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx server client.admin: adding key for service mon 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx server client.admin: adding key for service osd 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx keyserverdata: get_caps: name=client.admin 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx keyserverdata: get_secret: num of caps=4 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx server client.admin: adding key for service mgr 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx keyserverdata: get_caps: name=client.admin 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx keyserverdata: get_secret: num of caps=4 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx: build_service_ticket_reply encoding 3 tickets with secret REDACTED== 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx: build_service_ticket service mon secret_id 11455 ticket_info.ticket.name=client.admin ticket.global_id 6751725 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx: build_service_ticket service osd secret_id 11455 ticket_info.ticket.name=client.admin ticket.global_id 6751725 2023-03-14T09:34:48.851+0000 7fcc84dbc700 10 cephx: build_service_ticket service mgr secret_id 11455 ticket_info.ticket.name=client.admin ticket.global_id 6751725 2023-03-14T09:34:48.851+0000 7fcc865bf700 0 mon.mon-yca4ceph@0(leader) e8 handle_command mon_command({"prefix": "osd ls"} v 0) v1 2023-03-14T09:34:48.851+0000 7fcc865bf700 0 log_channel(audit) log [DBG] : from='client.? 10.180.10.36:0/329477069' entity='client.admin' cmd=[{"prefix": "osd ls"}]: dispatch -- +4916093821054

1 year, 2 months

1
0
0 0

Mixed mode ssd and hdd issue

by xadhoom76＠gmail.com

Hi, we have a cluster with 3 nodes . Each node has 4 HDD and 1 SSD We would like to have a pool only on ssd and a pool only on hdd, using class feature. here is the setup # buckets host ceph01s3 { id -3 # do not change unnecessarily id -4 class hdd # do not change unnecessarily id -21 class ssd # do not change unnecessarily # weight 34.561 alg straw2 hash 0 # rjenkins1 item osd.0 weight 10.914 item osd.5 weight 10.914 item osd.8 weight 10.914 item osd.9 weight 1.819 } host ceph02s3 { id -5 # do not change unnecessarily id -6 class hdd # do not change unnecessarily id -22 class ssd # do not change unnecessarily # weight 34.561 alg straw2 hash 0 # rjenkins1 item osd.1 weight 10.914 item osd.3 weight 10.914 item osd.7 weight 10.914 item osd.10 weight 1.819 } host ceph03s3 { id -7 # do not change unnecessarily id -8 class hdd # do not change unnecessarily id -23 class ssd # do not change unnecessarily # weight 34.561 alg straw2 hash 0 # rjenkins1 item osd.2 weight 10.914 item osd.4 weight 10.914 item osd.6 weight 10.914 item osd.11 weight 1.819 } root default { id -1 # do not change unnecessarily id -2 class hdd # do not change unnecessarily id -24 class ssd # do not change unnecessarily # weight 103.683 alg straw2 hash 0 # rjenkins1 item ceph01s3 weight 34.561 item ceph02s3 weight 34.561 item ceph03s3 weight 34.561 } # rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default class hdd step chooseleaf firstn 0 type host step emit } rule erasure-code { id 1 type erasure min_size 3 max_size 4 step take default class hdd step set_chooseleaf_tries 5 step set_choose_tries 100 step chooseleaf indep 0 type host step emit } rule erasure2_1 { id 2 type erasure min_size 3 max_size 3 step take default class hdd step set_chooseleaf_tries 5 step set_choose_tries 100 step chooseleaf indep 0 type host step emit } rule erasure-pool.meta { id 3 type erasure min_size 3 max_size 3 step take default class hdd step set_chooseleaf_tries 5 step set_choose_tries 100 step chooseleaf indep 0 type host step emit } rule erasure-pool.data { id 4 type erasure min_size 3 max_size 3 step take default class hdd step set_chooseleaf_tries 5 step set_choose_tries 100 step chooseleaf indep 0 type host step emit } rule replicated_rule_ssd { id 5 type replicated min_size 1 max_size 10 step take default class ssd step chooseleaf firstn 0 type host step emit } # end crush map pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 1669 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr_devicehealth pool 5 'Datapool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 2749 lfor 0/0/321 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 7 'erasure-pool.data' erasure profile k2m1 size 3 min_size 2 crush_rule 4 object_hash rjenkins pg_num 128 pgp_num 126 pgp_num_target 128 autoscale_mode on last_change 2780 lfor 0/0/1676 flags hashpspool,ec_overwrites stripe_width 8192 application cephfs pool 8 'erasure-pool.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 344 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs pool 9 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 592 flags hashpspool stripe_width 0 application rgw pool 10 'brescia-ovest.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 595 flags hashpspool stripe_width 0 application rgw pool 11 'brescia-ovest.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 597 flags hashpspool stripe_width 0 application rgw pool 12 'brescia-ovest.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 719 lfor 0/719/717 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw pool 13 'brescia-ovest.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 845 lfor 0/845/843 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw pool 14 'brescia-ovest.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 739 flags hashpspool stripe_width 0 application rgw pool 15 'brescia-ovest.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 849 flags hashpspool stripe_width 0 application rgw pool 17 'ssd_pool' replicated size 3 min_size 2 crush_rule 5 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 2774 lfor 0/0/2653 flags hashpspool stripe_width 0 application rbd rados bench -p ssd_pool 10 write --no-cleanup Object prefix: benchmark_data_ceph01s3.itservicenet.net_268 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 32 16 63.9957 64 0.937457 0.917873 2 16 48 32 63.9936 64 0.882778 0.898646 3 16 64 48 63.9929 64 0.949337 0.903045 4 16 81 65 64.9925 68 0.515819 0.897401 5 16 97 81 64.7919 64 1.00908 0.918797 6 16 114 98 65.3248 68 0.99787 0.922301 7 16 130 114 65.1339 64 0.794492 0.903341 8 16 147 131 65.4909 68 0.770237 0.892833 9 16 173 157 69.7677 104 0.976005 0.878237 10 16 195 179 71.5891 88 0.755363 0.869603 That is very poor ! Why ? Thanks

1 year, 2 months

2
1
0 0

bucket.sync-status mdlogs not remove

by Bernie(Chanyeol) Yoon

hi , all I m using rgw multisite with ceph 17.2.5 deployed with rook. A number of bucket.sync-status mdlogs with names of buckets deleted during maintenance were found. test env ) bash-4.4$ rados -p master.rgw.log ls | grep bucket.sync-status | grep test1 bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:5 bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:4 bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:1 bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:7 bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:2 bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:10 bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:6 bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:3 bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:0 bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:9 bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:8 bash-4.4$ radosgw-admin bucket list [ "rook-ceph-bucket-checker-94e835a7-6356-46bf-a90c-591b23b15959", "230314", "cyyoon", "test23031303" ] How to delete the bucket.sync-status mdlog of the deleted bucket in this situation? Should I proceed with mdlog trim ? As these issues accumulated in the prod environment, a large omap issue occurred in the log pool. root@osd-001:~# radosgw-admin log list| grep [DELETED BUCKET NAME] | wc -l 25636 Any other ideas as to what might be causing this, or anything else we could try to help diagnose or fix this? Thanks in advance!

1 year, 2 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2023