January 2024 - ceph-users

Radosgw not syncing files/folders with slashes in object name

by Matt Dunavant

Hi all, I need some help troubleshooting a strange issue where my relatively newly setup 2 ceph clusters (17.2.6) are setup for replication and I can get files with just regular names (example.txt for example) to sync but anything with a slash or a folder type (folder1/folder2/example.txt for example) won't sync over. Not sure exactly why this would be the case as I'm pretty sure slashes are allowed in object names, https://docs.ceph.com/en/latest/radosgw/layout/. Any ideas or something obvious I'm missing? Sync status looks normal and I have tested this with a variety of new and old buckets and the behavior always stays the same, nothing with a slash syncs but everything without does. Thanks in advance, -Matt Dunavant

4 months, 1 week

1
0
0 0

Network Flapping Causing Slow Ops and Freezing VMs

by mahnoosh shahidi

Hi all, I hope this message finds you well. We recently encountered an issue on one of our OSD servers, leading to network flapping and subsequently causing significant performance degradation across our entire cluster. Although the OSDs were correctly marked as down in the monitor, slow ops persisted until we resolved the network issue. This incident resulted in a major disruption, especially affecting VMs with mapped RBD images, leading to their freezing. In light of this, I have two key questions for the community: 1. Why did slow ops persist even after marking the affected server as down in the monitor? 2.Are there any recommended configurations for OSD suicide or OSD down reports that could help us better handle similar network-related issues in the future? Best Regards, Mahnoosh

4 months, 1 week

3
5
0 0

cephadm bootstrap on 3 network clusters

by Luis Domingues

Hi, I am bootstrapping a ceph cluster using cephadm, and our cluster uses 3 networks. We have - 1 network as public network (10.X.X.0/24) (pub) - 1 network as cluster network (10.X.Y.0/24) (cluster) - 1 network for management (172.Z.Z.0/24) (mgmt) The nodes are reachable using SSH only on mgmt network. However, they are reachable for our services using pub network. I want my MONs to be bind to this pub network. But when I bootstrap my cluster, I set my MON IP and CLUSTER NETWORK, and then the bootstrap process tries to add my bootstrap node using the MON IP. And then fails because it cannot reach the node. If I apply proper spec after it works fine, but the bootstrap process did not finish properly. Is there an option to tell cephadm to not use MON IP but another one for accessing the node during the bootstrap? If I tell it --skip-prepare-host, it tries to connect to it anyway, and then fails. Thanks, Luis Domingues Proton AG

4 months, 1 week

4
6
0 0

osd_mclock_max_capacity_iops_hdd in Reef

by Luis Domingues

Hi all, We are testing migrations from a cluster running Pacific to Reef. In pacific we needed to tweak osd_mclock_max_capacity_iops_hdd to have decent performances of ou cluster. But in reef it looks like changing the value of osd_mclock_max_capacity_iops_hdd does not impact cluster performances. Did osd_mclock_max_capacity_iops_hdd became useless? I did not found anything regarding it on the changelogs, but I could have miss something. Luis Domingues Proton AG

4 months, 1 week

2
3
0 0

Stuck in upgrade process to reef

by Jan Marek

Hello, I've problem: my ceph cluster (3x mon nodes, 6x osd nodes, every osd node have 12 rotational disk and one NVMe device for bluestore DB). CEPH is installed by ceph orchestrator and have bluefs storage on osd. I've started process upgrade from version 17.2.6 to 18.2.1 by invocating: ceph orch upgrade start --ceph-version 18.2.1 After upgrade of mon and mgr processes orchestrator tried to upgrade the first OSD node, but they are falling down. I've stop the process of upgrade, but I have 1 osd node completely down. After upgrade I've got some error messages and I've found /var/lib/ceph/crashxxxx directories, I attach to this message files, which I've found here. Please, can you advice, what now I can do? It seems, that rocksdb is even non-compatible or corrupted :-( Thanks in advance. Sincerely Jan Marek -- Ing. Jan Marek University of South Bohemia Academic Computer Centre Phone: +420389032080 http://www.gnu.org/philosophy/no-word-attachments.cs.html

4 months, 1 week

2
8
0 0

ceph -s: wrong host count

by Jan Kasprzak

Hello, Ceph users! I have recently noticed that when I reboot a single ceph node, ceph -s reports "5 hosts down" instead of one. The following is captured during reboot of a node with two OSDs: health: HEALTH_WARN noout flag(s) set 2 osds down 5 hosts (2 osds) down [...] mon: 3 daemons, quorum mon1,mon3,mon2 (age 8h) mgr: mon2(active, since 2d), standbys: mon3, mon1 osd: 34 osds: 32 up (since 2m), 34 in (since 4M) flags noout rgw: 1 daemon active (1 hosts, 1 zones) After the node successfully reboots, ceph -s reports "HEALTH OK" and of course no OSDs and no hosts are reported as being down. Does anybody else see this as well? This is Ceph 18.2.1, but I think I have seen this on Ceph 17 as well. Thanks, -Yenya -- | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> | | https://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise. --Larry Wall

4 months, 1 week

2
2
0 0

Problems with "admin" bucket

by Manuel Holtgrewe

Hello, I have an issue with a Ceph 17.2.6 cluster. The dashboard says "The Object Gateway Service is not configured" when trying to access the Object Gateway section. It used to work before. One interesting symptom: the "admin" bucket exists in the output of "radosgw-admin bucket list" but it does not exist in "radosgw-admin bucket stats". Rather, I get a number of "ERROR: could not decode buffer info, caught buffer::error" messages from the "radosgw-admin bucket stats" command. Also, I cannot remove the "admin" bucket because I also get the same error (I thought about starting fresh with the admin bucket). Could someone help me debug this further and eventually resolve the issue? There is no critical data in the radosgw buckets (the cluster is primarily accessed via an CephFS cluster), so clearing all radosgw buckets is an option. Ideally, I could repair this, however. Kind regards, Manuel

4 months, 1 week

1
1
0 0

No metrics shown in dashboard (18.2.1)

by Dietmar Rieder

Hi, I just freshly deployed a new cluster (v18.2.1) using cephadm. Now before creating pools, cephfs and so on I wanted to check if the dashboard is working and if I get some metrics. If I navigate to Cluster >> Hosts and open one of the OSD hosts the "Performance Details" tab is shown but all graphs display "no data". "OSDs" and "Raw Capacity" in that tab display "N/A". Prometheus is running: [root@cephmon-01 ~]# ceph orch ps --service_name prometheus NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID prometheus.cephmon-01 cephmon-01 *:9095 running (54m) 9m ago 3d 51.4M - 2.43.0 a07b618ecd1d 11b4b19df0d6 However it has no data collected: [root@cephmon-01 ~]# curl -s -XGET http://127.0.0.1:9095/api/v1/targets/metadata {"status":"success","data":[]} ceph-exporter services also seem to be running: [root@cephmon-01 ~]# ceph orch ps --service_name ceph-exporter NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID ceph-exporter.cephmon-01 cephmon-01 running (46h) 10m ago 3d 16.7M - 18.2.1 d2cdd87030d1 191eedccbcd8 ceph-exporter.cephmon-02 cephmon-02 running (3d) 2m ago 3d 16.5M - 18.2.1 d2cdd87030d1 7a08e9f1401c ceph-exporter.cephmon-03 cephmon-03 running (3d) 6m ago 3d 16.5M - 18.2.1 d2cdd87030d1 1eb4856a60d4 ceph-exporter.cephosd-01 cephosd-01 running (2d) 10m ago 2d 19.9M - 18.2.1 d2cdd87030d1 05642098f7de ceph-exporter.cephosd-02 cephosd-02 running (2d) 10m ago 2d 19.9M - 18.2.1 d2cdd87030d1 648715ecaa9d ceph-exporter.cephosd-03 cephosd-03 running (2d) 10m ago 2d 19.4M - 18.2.1 d2cdd87030d1 b8bb6dcb5386 ceph-exporter.cephosd-04 cephosd-04 running (2d) 10m ago 2d 19.5M - 18.2.1 d2cdd87030d1 4f1964f79ffe ceph-exporter.cephosd-05 cephosd-05 running (2d) 10m ago 2d 19.8M - 18.2.1 d2cdd87030d1 8ca8cbbf3984 ceph-exporter.cephosd-06 cephosd-06 running (2d) 10m ago 2d 19.4M - 18.2.1 d2cdd87030d1 a5e2860cc98e ceph-exporter.cephosd-07 cephosd-07 running (2d) 3m ago 2d 19.8M - 18.2.1 d2cdd87030d1 4eb01b8ebd33 ceph-exporter.cephosd-08 cephosd-08 running (2d) 3m ago 2d 19.9M - 18.2.1 d2cdd87030d1 b934866d2a1d ceph-exporter.cephosd-10 cephosd-10 running (2d) 3m ago 2d 19.4M - 18.2.1 d2cdd87030d1 457368d07579 ceph-exporter.cephosd-11 cephosd-11 running (2d) 3m ago 2d 19.5M - 18.2.1 d2cdd87030d1 e561cfac4209 ceph-exporter.cephosd-12 cephosd-12 running (2d) 9m ago 2d 19.9M - 18.2.1 d2cdd87030d1 0e5773c8e038 as well as node-exporter services: [root@cephmon-01 ~]# ceph orch ps --service_name node-exporter NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID node-exporter.cephmon-01 cephmon-01 *:9100 running (46h) 19s ago 3d 11.3M - 1.5.0 0da6a335fe13 72fefb7966ff node-exporter.cephmon-02 cephmon-02 *:9100 running (3d) 3m ago 3d 13.5M - 1.5.0 0da6a335fe13 2041d3d385b0 node-exporter.cephmon-03 cephmon-03 *:9100 running (3d) 7m ago 3d 12.9M - 1.5.0 0da6a335fe13 6ef204d12a7d node-exporter.cephosd-01 cephosd-01 *:9100 running (2d) 18s ago 2d 13.0M - 1.5.0 0da6a335fe13 6b05483b05c6 node-exporter.cephosd-02 cephosd-02 *:9100 running (2d) 18s ago 2d 11.4M - 1.5.0 0da6a335fe13 ede4995ffb1d node-exporter.cephosd-03 cephosd-03 *:9100 running (2d) 18s ago 2d 11.5M - 1.5.0 0da6a335fe13 cfbf15168667 node-exporter.cephosd-04 cephosd-04 *:9100 running (2d) 18s ago 2d 13.2M - 1.5.0 0da6a335fe13 5dc4794a7f6e node-exporter.cephosd-05 cephosd-05 *:9100 running (2d) 18s ago 2d 13.4M - 1.5.0 0da6a335fe13 8dfa1e252f82 node-exporter.cephosd-06 cephosd-06 *:9100 running (2d) 18s ago 2d 13.5M - 1.5.0 0da6a335fe13 93467e37df08 node-exporter.cephosd-07 cephosd-07 *:9100 running (2d) 4m ago 2d 13.2M - 1.5.0 0da6a335fe13 11795b83732d node-exporter.cephosd-08 cephosd-08 *:9100 running (2d) 4m ago 2d 13.5M - 1.5.0 0da6a335fe13 04197f3a6eb1 node-exporter.cephosd-10 cephosd-10 *:9100 running (2d) 4m ago 2d 13.2M - 1.5.0 0da6a335fe13 9e904581442c node-exporter.cephosd-11 cephosd-11 *:9100 running (2d) 4m ago 2d 13.0M - 1.5.0 0da6a335fe13 5164113044ed node-exporter.cephosd-12 cephosd-12 *:9100 running (2d) 16s ago 2d 13.3M - 1.5.0 0da6a335fe13 5c8af368eed4 I'm a bit lost, how can I get this running? Thanks for any help Dietmar

4 months, 2 weeks

1
1
0 0

Re: rbd persistent cache configuration

by Peter

Thanks for ressponse! Yes, it is in use "watcher=10.1.254.51:0/1544956346 client.39553300 cookie=140244238214096" this is indicating the client is connect the image. I am using fio perform write task on it. I guess it is the feature not enable correctly or setting somewhere incorrect. Should I restart any process after modifying Ceph config? Any thought?

4 months, 2 weeks

2
1
0 0

rbd persistent cache configuration

by Peter

I follow below document to setup image level rbd persistent cache, however I get error output while i using the command provide by the document. I have put my commands and descriptions below. Can anyone give some instructions? thanks in advance. https://docs.ceph.com/en/pacific/rbd/rbd-persistent-write-back-cache/ https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html/b… [https://access.redhat.com/webassets/avalon/g/shadowman-200.png]<https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html/b…> Chapter 2. Ceph block devices Red Hat Ceph Storage 5 | Red Hat Customer Portal<https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html/b…> Access Red Hat’s knowledge, guidance, and support through your subscription. access.redhat.com I tried use host level client command, i got no error, however I won't be able to get cache usage output. "ceph config set client rbd_persistent_cache_mode ssd ceph config set client rbd_plugins pwl_cache" [root@master-node1 ceph]# rbd info sas-pool/testdrive rbd image 'testdrive': size 40 GiB in 10240 objects order 22 (4 MiB objects) snapshot_count: 0 id: 3de76a7e7c519 block_name_prefix: rbd_data.3de76a7e7c519 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten op_features: flags: create_timestamp: Thu Jun 29 02:03:41 2023 access_timestamp: Thu Jun 29 07:19:40 2023 modify_timestamp: Thu Jun 29 07:18:00 2023 I check feature exclusive-lock has been already enabled and I run following command get fault output. [root@master-node1 ceph]# rbd config image set sas-pool/testdrive rbd_persistent_cache_mode ssd rbd: invalid config key: rbd_persistent_cache_mode [root@master-node1 ceph]# rbd config image set sas-pool/testdrive rbd_plugins pwl_cache rbd: invalid config key: rbd_plugins root@node1:~# rbd status sas-pool/testdrive Watchers: watcher=10.1.254.51:0/1544956346 client.39553300 cookie=140244238214096 I hope to see the output include the persistent cache state like below: $ rbd status rbd/foo Watchers: watcher=10.10.0.102:0/1061883624 client.25496 cookie=140338056493088 Persistent cache state: host: sceph9 path: /mnt/nvme0/rbd-pwl.rbd.101e5824ad9a.pool size: 1 GiB mode: ssd stats_timestamp: Sun Apr 10 13:26:32 2022 present: true empty: false clean: false allocated: 509 MiB cached: 501 MiB dirty: 338 MiB free: 515 MiB hits_full: 1450 / 61% hits_partial: 0 / 0% misses: 924 hit_bytes: 192 MiB / 66% miss_bytes: 97 MiB

4 months, 2 weeks

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2024