- ceph-users - lists.ceph.io

by Erich Weiler

So I'm trying to figure out ways to reduce the number of warnings I'm getting and I'm thinking about the one "client failing to respond to cache pressure". Is there maybe a way to tell a client (or all clients) to reduce the amount of cache it uses or to release caches quickly? Like, all the time? I know the linux kernel (and maybe ceph) likes to cache everything for a while, and rightfully so, but I suspect in my use case it may be more efficient to more quickly purge the cache or to in general just cache way less overall...? We have many thousands of threads all doing different things that are hitting our filesystem, so I suspect the caching isn't really doing me much good anyway due to the churn, and probably is causing more problems than it helping... -erich

4 hours, 30 minutes

3
4
0 0

Remove an OSD with hardware issue caused rgw 503

by Mary Zhang

Hi, We recently removed an osd from our Cepth cluster. Its underlying disk has a hardware issue. We use command: ceph orch osd rm osd_id --zap During the process, sometimes ceph cluster enters warning state with slow ops on this osd. Our rgw also failed to respond to requests and returned 503. We restarted rgw daemon to make it work again. But the same failure occured from time to time. Eventually we noticed that rgw 503 error is a result of osd slow ops. Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with hardware issue won't impact cluster performance & rgw availbility. Is our expectation reasonable? What's the best way to handle osd with hardware failures? Thank you in advance for any comments or suggestions. Best Regards, Mary Zhang

9 hours, 27 minutes

3
4
0 0

Re: Add node-exporter using ceph orch

by Robert Sander

On 4/26/24 15:47, Vahideh Alinouri wrote: > The result of this command shows one of the servers in the cluster, > but I have node-exporter daemons on all servers. The default service specification looks like this: service_type: node-exporter service_name: node-exporter placement: host_pattern: '*' If you apply this YAML code the orchestrator should deploy one node-exporter daemon to each host of the cluster. Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin

12 hours, 18 minutes

1
0
0 0

Public Swift bucket with Openstack Keystone integration - not working in quincy/reef

by Bartosz Bezak

Hi, Similar case as with previously fixed https://tracker.ceph.com/issues/48382 - https://github.com/ceph/ceph/pull/47308. Confirmed on Cephadm deployed Ceph 18.2.2/17.2.7 with Openstack Antelope/Yoga. I’m getting "404 NoSuchBucket" error with public buckets. Enabled with Swift/Keystone integration - everything else works fine. With rgw_swift_account_in_url = true and proper endpoints: "https://rgw.test/swift/v1/AUTH_%(project_id)s" ticking public access in horizon properly sets ACL on the bucket according to swift client: swift -v stat test-bucket URL: https://rgw.test/swift/v1/AUTH_daksjhdkajdshda/testbucket Auth Token: Account: AUTH_daksjhdkajdshda Container: testbucket Objects: 1 Bytes: 1021036 Read ACL: .r:*,.rlistings Write ACL: Sync To: Sync Key: X-Timestamp: 1710947159.41219 X-Container-Bytes-Used-Actual: 1024000 X-Storage-Policy: default-placement X-Storage-Class: STANDARD Last-Modified: Thu, 21 Mar 2024 10:30:05 GMT X-Trans-Id: tx00000092ac12312312312-1231231231-1701e5-default X-Openstack-Request-Id: tx00000092ac12312312312-1231231231-1701e5-default Accept-Ranges: bytes Content-Type: text/plain; charset=utf-8 however still getting 404 NoSuchBucket error Could someone using the latest version of Ceph with Swift/Keystone integration please test public buckets? Thank you. Best regards, Bartosz Bezak

14 hours, 31 minutes

1
0
0 0

Add node-exporter using ceph orch

by Vahideh Alinouri

Hi guys, I have tried to add node-exporter to the new host in ceph cluster by the command mentioned in the document. ceph orch apply node-exporter hostname I think there is a functionality issue because cephadm log print node-exporter was applied successfully, but it didn't work! I tried the below command and it worked! ceph orch daemon add node-exporter hostname Which way is the correct way?

15 hours, 31 minutes

2
1
0 0

Setup Ceph over RDMA

by Vahideh Alinouri

Hi guys, I need setup Ceph over RDMA, but I faced many issues! The info regarding my cluster: Ceph version is Reef Network cards are Broadcom RDMA. RDMA connection between OSD nodes are OK. I just found ms_type = async+rdma config in document and apply it using ceph config set global ms_type async+rdma After this action the cluster crashes. I tried to cluster back, and I did: Put ms_type async+posix in ceph.conf Restart all MON services The cluster is back, but I don't have any active mgr. All OSDs are down too. Is there any order to do for setting up Ceph over RDMA? Thanks

16 hours, 3 minutes

1
1
0 0

MDS crash

by alexey.gerasimov＠opencascade.com

Dear colleagues, hope that anybody can help us. The initial point: Ceph cluster v15.2 (installed and controlled by the Proxmox) with 3 nodes based on physical servers rented from a cloud provider. CephFS is installed also. Yesterday we discovered that some of the applications stopped working. During the investigation we recognized that we have the problem with Ceph, more precisely with СephFS - MDS daemons suddenly crashed. We tried to restart them and found that they crashed again immediately after the start. The crash information: 2024-04-17T17:47:42.841+0000 7f959ced9700 1 mds.0.29134 recovery_done -- successful recovery! 2024-04-17T17:47:42.853+0000 7f959ced9700 1 mds.0.29134 active_start 2024-04-17T17:47:42.881+0000 7f959ced9700 1 mds.0.29134 cluster recovered. 2024-04-17T17:47:43.825+0000 7f959aed5700 -1 ./src/mds/OpenFileTable.cc: In function 'void OpenFileTable::commit(MDSContext*, uint64_t, int)' thread 7f959aed5700 time 2024-04-17T17:47:43.831243+0000 ./src/mds/OpenFileTable.cc: 549: FAILED ceph_assert(count > 0) Next hours we read the tons of articles, studied the documentation, and checked the common state of Ceph cluster by the various diagnostic commands – but didn’t find anything wrong. At evening we decided to upgrade it up to v16, and finally to v17.2.7. Unfortunately, it didn’t solve the problem, MDS continue to crash with the same error. The only difference that we found is “1 MDSs report damaged metadata” in the output of ceph -s – see it below. I supposed that it may be the well-known bug, but couldn’t find the same one on https://tracker.ceph.com - there are several bugs associated with file OpenFileTable.cc but not related to ceph_assert(count > 0) We tried to check the source code of OpenFileTable.cc also, here is a fragment of it, in function OpenFileTable::_journal_finish int omap_idx = anchor.omap_idx; unsigned& count = omap_num_items.at(omap_idx); ceph_assert(count > 0); So, we guess that the object map is empty for some object in Ceph, and it is unexpected behavior. But again, we found nothing wrong in our cluster… Next, we started with https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ article – tried to reset the journal (despite that it was Ok all the time) and wipe the sessions using cephfs-table-tool all reset session command. No result… Now I decided to continue following this article and run cephfs-data-scan scan_extents command, it is working just now. But I have a doubt that it will solve the issue because of no problem with our objects in Ceph. Is it the new bug? or something else? Any idea is welcome! The important outputs: ----- ceph -s cluster: id: 4cd1c477-c8d0-4855-a1f1-cb71d89427ed health: HEALTH_ERR 1 MDSs report damaged metadata insufficient standby MDS daemons available 83 daemons have recently crashed 3 mgr modules have recently crashed services: mon: 3 daemons, quorum asrv-dev-stor-2,asrv-dev-stor-3,asrv-dev-stor-1 (age 22h) mgr: asrv-dev-stor-2(active, since 22h), standbys: asrv-dev-stor-1 mds: 1/1 daemons up osd: 18 osds: 18 up (since 22h), 18 in (since 29h) data: volumes: 1/1 healthy pools: 5 pools, 289 pgs objects: 29.72M objects, 5.6 TiB usage: 21 TiB used, 47 TiB / 68 TiB avail pgs: 287 active+clean 2 active+clean+scrubbing+deep io: client: 2.5 KiB/s rd, 172 KiB/s wr, 261 op/s rd, 195 op/s wr -----ceph fs dump e29480 enable_multiple, ever_enabled_multiple: 0,1 default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} legacy client fscid: 1 Filesystem 'cephfs' (1) fs_name cephfs epoch 29480 flags 12 joinable allow_snaps allow_multimds_snaps created 2022-11-25T15:56:08.507407+0000 modified 2024-04-18T16:52:29.970504+0000 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 required_client_features {} last_failure 0 last_failure_osd_epoch 14728 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} max_mds 1 in 0 up {0=156636152} failed damaged stopped data_pools [5] metadata_pool 6 inline_data disabled balancer standby_count_wanted 1 [mds.asrv-dev-stor-1{0:156636152} state up:active seq 6 laggy since 2024-04-18T16:52:29.970479+0000 addr [v2:172.22.2.91:6800/2487054023,v1:172.22.2.91:6801/2487054023] compat {c=[1],r=[1],i=[7ff]}] -----cephfs-journal-tool --rank=cephfs:0 journal inspect Overall journal integrity: OK -----ceph pg dump summary version 41137 stamp 2024-04-18T21:17:59.133536+0000 last_osdmap_epoch 0 last_pg_scan 0 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG sum 29717605 0 0 0 0 6112544251872 13374192956 28493480 1806575 1806575 OSD_STAT USED AVAIL USED_RAW TOTAL sum 21 TiB 47 TiB 21 TiB 68 TiB -----ceph pg dump pools POOLID OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG 8 31771 0 0 0 0 131337887503 2482 140 401246 401246 7 839707 0 0 0 0 3519034650971 736 61 399328 399328 6 1319576 0 0 0 0 421044421 13374189738 28493279 206749 206749 5 27526539 0 0 0 0 2461702171417 0 0 792165 792165 2 12 0 0 0 0 48497560 0 0 6991 6991

16 hours, 56 minutes

5
7
0 0

ceph recipe for nfs exports

by Roberto Maggi ＠ Debian

Hi you all, I'm almost new to ceph and I'm understanding, day by day, why the official support is so expansive :) I setting up a ceph nfs network cluster whose recipe can be found here below. ####################### --> cluster creation cephadm bootstrap --mon-ip 10.20.20.81 --cluster-network 10.20.20.0/24 --fsid $FSID --initial-dashboard-user adm \ --initial-dashboard-password 'Hi_guys' --dashboard-password-noupdate --allow-fqdn-hostname --ssl-dashboard-port 443 \ --dashboard-crt /etc/ssl/wildcard.it/wildcard.it.crt --dashboard-key /etc/ssl/wildcard.it/wildcard.it.key \ --allow-overwrite --cleanup-on-failure cephadm shell --fsid $FSID -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring cephadm add-repo --release reef && cephadm install ceph-common --> adding hosts and set labels for IP in $(grep ceph /etc/hosts | awk '{print $1}') ; do ssh-copy-id -f -i /etc/ceph/ceph.pub root@$IP ; done ceph orch host add cephstage01 10.20.20.81 --labels _admin,mon,mgr,prometheus,grafana ceph orch host add cephstage02 10.20.20.82 --labels _admin,mon,mgr,prometheus,grafana ceph orch host add cephstage03 10.20.20.83 --labels _admin,mon,mgr,prometheus,grafana ceph orch host add cephstagedatanode01 10.20.20.84 --labels osd,nfs,prometheus ceph orch host add cephstagedatanode02 10.20.20.85 --labels osd,nfs,prometheus ceph orch host add cephstagedatanode03 10.20.20.86 --labels osd,nfs,prometheus --> network setup and daemons deploy ceph config set mon public_network 10.20.20.0/24,192.168.7.0/24 ceph orch apply mon --placement="cephstage01:10.20.20.81,cephstage02:10.20.20.82,cephstage03:10.20.20.83" ceph orch apply mgr --placement="cephstage01:10.20.20.81,cephstage02:10.20.20.82,cephstage03:10.20.20.83" ceph orch apply prometheus --placement="cephstage01:10.20.20.81,cephstage02:10.20.20.82,cephstage03:10.20.20.83,cephstagedatanode01:10.20.20.84,cephstagedatanode02:10.20.20.85,cephstagedatanode03:10.20.20.86" ceph orch apply grafana --placement="cephstage01:10.20.20.81,cephstage02:10.20.20.82,cephstage03:10.20.20.83,cephstagedatanode01:10.20.20.84,cephstagedatanode02:10.20.20.85,cephstagedatanode03:10.20.20.86" ceph orch apply node-exporter ceph orch apply alertmanager ceph config set mgr mgr/cephadm/secure_monitoring_stack true --> disks and osd setup for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; do ssh root@$IP "hostname && wipefs -a -f /dev/sdb&& wipefs -a -f /dev/sdc"; done ceph config set mgr mgr/cephadm/device_enhanced_scan true for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; doceph orch device ls --hostname=$IP --wide --refresh ; done for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; doceph orch device zap $IP /dev/sdb; done for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; doceph orch device zap $IP /dev/sdc ; done for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; doceph orch daemon add osd $IP:/dev/sdb ; done for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; doceph orch daemon add osd $IP:/dev/sdc ; done --> ganesha nfs cluster ceph mgr module enable nfs ceph fs volume create vol1 ceph nfs cluster create nfs-cephfs "cephstagedatanode01,cephstagedatanode02,cephstagedatanode03" --ingress --virtual-ip 192.168.7.80 --ingress-mode default ceph nfs export create cephfs --cluster-id nfs-cephfs --pseudo-path /mnt --fsname vol1 --> nfs mount mount -t nfs -o nfsvers=4.1,proto=tcp 192.168.7.80:/mnt /mnt/ceph is my recipe correct? the cluster is set up by 3 mon/mgr nodes and 3 osd/nfs nodes, on the latters I installed one 3tb ssd, for the data, and one 300gb ssd for the journaling but my problems are : - Although I can mount the export I can't write on it - I can't understand how to use the sdc disks for journaling - I can't understand the concept of "pseudo path" here below you can find the json output of the exports --> check ceph nfs export ls nfs-cephfs ceph nfs export info nfs-cephfs /mnt ------------------------------------ json file --------- { "export_id": 1, "path": "/", "cluster_id": "nfs-cephfs", "pseudo": "/mnt", "access_type": "RW", "squash": "none", "security_label": true, "protocols": [ 4 ], "transports": [ "TCP" ], "fsal": { "name": "CEPH", "user_id": "nfs.nfs-cephfs.1", "fs_name": "vol1" }, "clients": [] } ------------------------------------ Thanks in advance Rob

1 day, 6 hours

5
5
0 0

Impact of large PG splits

by Eugen Block

Hi, I'm trying to estimate the possible impact when large PGs are splitted. Here's one example of such a PG: PG_STAT OBJECTS BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG UP 86.3ff 277708 414403098409 0 0 3092 3092 [187,166,122,226,171,234,177,163,155,34,81,239,101,13,117,8,57,111] Their main application is RGW on EC (currently 1024 PGs on 240 OSDs), 8TB HDDs backed by SSDs. There are 6 RGWs running behind HAProxies. It took me a while to convince them to do a PG split and now they're trying to assess how big the impact could be. The fullest OSD is already at 85% usage, the least filled one at 59%, so there is definitely room for a better balancing which, will be necessary until the new hardware arrives. The current distribution is around 100 PGs per OSD which usually would be fine, but since the PGs are that large only a few PGs difference have a huge impact on the OSD utilization. I'm targeting 2048 PGs for that pool for now, probably do another split when the new hardware has been integrated. Any comments are appreciated! Eugen

1 day, 13 hours

6
15
0 0

rbd-mirror get status updates quicker

by Stefan Kooman

Hi, We're testing with rbd-mirror (mode snapshot) and try to get status updates about snapshots as fast a possible. We want to use rbd-mirror as a migration tool between two clusters and keep downtime during migration as short as possible. Therefore we have tuned the following parameters and set them to 1 second (default 30 seconds): rbd_mirror_pool_replayers_refresh_interval rbd_mirror_image_state_check_interval rbd_mirror_sync_point_update_age However, on the destination cluster, the "last_update:" field is only updated every 30 seconds. Is this tunable? Goal is to determine when the last snapshot that is made on the source has made it to the target and a demote (source) and promote (target) can be initiated. Gr. Stefan

1 day, 17 hours

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users