- ceph-users - lists.ceph.io

by Roberto Maggi ＠ Debian

Hi you all, I'm almost new to ceph and I'm understanding, day by day, why the official support is so expansive :) I setting up a ceph nfs network cluster whose recipe can be found here below. ####################### --> cluster creation cephadm bootstrap --mon-ip 10.20.20.81 --cluster-network 10.20.20.0/24 --fsid $FSID --initial-dashboard-user adm \ --initial-dashboard-password 'Hi_guys' --dashboard-password-noupdate --allow-fqdn-hostname --ssl-dashboard-port 443 \ --dashboard-crt /etc/ssl/wildcard.it/wildcard.it.crt --dashboard-key /etc/ssl/wildcard.it/wildcard.it.key \ --allow-overwrite --cleanup-on-failure cephadm shell --fsid $FSID -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring cephadm add-repo --release reef && cephadm install ceph-common --> adding hosts and set labels for IP in $(grep ceph /etc/hosts | awk '{print $1}') ; do ssh-copy-id -f -i /etc/ceph/ceph.pub root@$IP ; done ceph orch host add cephstage01 10.20.20.81 --labels _admin,mon,mgr,prometheus,grafana ceph orch host add cephstage02 10.20.20.82 --labels _admin,mon,mgr,prometheus,grafana ceph orch host add cephstage03 10.20.20.83 --labels _admin,mon,mgr,prometheus,grafana ceph orch host add cephstagedatanode01 10.20.20.84 --labels osd,nfs,prometheus ceph orch host add cephstagedatanode02 10.20.20.85 --labels osd,nfs,prometheus ceph orch host add cephstagedatanode03 10.20.20.86 --labels osd,nfs,prometheus --> network setup and daemons deploy ceph config set mon public_network 10.20.20.0/24,192.168.7.0/24 ceph orch apply mon --placement="cephstage01:10.20.20.81,cephstage02:10.20.20.82,cephstage03:10.20.20.83" ceph orch apply mgr --placement="cephstage01:10.20.20.81,cephstage02:10.20.20.82,cephstage03:10.20.20.83" ceph orch apply prometheus --placement="cephstage01:10.20.20.81,cephstage02:10.20.20.82,cephstage03:10.20.20.83,cephstagedatanode01:10.20.20.84,cephstagedatanode02:10.20.20.85,cephstagedatanode03:10.20.20.86" ceph orch apply grafana --placement="cephstage01:10.20.20.81,cephstage02:10.20.20.82,cephstage03:10.20.20.83,cephstagedatanode01:10.20.20.84,cephstagedatanode02:10.20.20.85,cephstagedatanode03:10.20.20.86" ceph orch apply node-exporter ceph orch apply alertmanager ceph config set mgr mgr/cephadm/secure_monitoring_stack true --> disks and osd setup for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; do ssh root@$IP "hostname && wipefs -a -f /dev/sdb&& wipefs -a -f /dev/sdc"; done ceph config set mgr mgr/cephadm/device_enhanced_scan true for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; doceph orch device ls --hostname=$IP --wide --refresh ; done for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; doceph orch device zap $IP /dev/sdb; done for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; doceph orch device zap $IP /dev/sdc ; done for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; doceph orch daemon add osd $IP:/dev/sdb ; done for IP in $(grep cephstagedatanode/etc/hosts | awk '{print $1}') ; doceph orch daemon add osd $IP:/dev/sdc ; done --> ganesha nfs cluster ceph mgr module enable nfs ceph fs volume create vol1 ceph nfs cluster create nfs-cephfs "cephstagedatanode01,cephstagedatanode02,cephstagedatanode03" --ingress --virtual-ip 192.168.7.80 --ingress-mode default ceph nfs export create cephfs --cluster-id nfs-cephfs --pseudo-path /mnt --fsname vol1 --> nfs mount mount -t nfs -o nfsvers=4.1,proto=tcp 192.168.7.80:/mnt /mnt/ceph is my recipe correct? the cluster is set up by 3 mon/mgr nodes and 3 osd/nfs nodes, on the latters I installed one 3tb ssd, for the data, and one 300gb ssd for the journaling but my problems are : - Although I can mount the export I can't write on it - I can't understand how to use the sdc disks for journaling - I can't understand the concept of "pseudo path" here below you can find the json output of the exports --> check ceph nfs export ls nfs-cephfs ceph nfs export info nfs-cephfs /mnt ------------------------------------ json file --------- { "export_id": 1, "path": "/", "cluster_id": "nfs-cephfs", "pseudo": "/mnt", "access_type": "RW", "squash": "none", "security_label": true, "protocols": [ 4 ], "transports": [ "TCP" ], "fsal": { "name": "CEPH", "user_id": "nfs.nfs-cephfs.1", "fs_name": "vol1" }, "clients": [] } ------------------------------------ Thanks in advance Rob

2 weeks, 3 days

5
7
0 0

rbd-mirror failed to query services: (13) Permission denied

by Stefan Kooman

Hi, We are testing rbd-mirroring. There seems to be a permission error with the rbd-mirror user. Using this user to query the mirror pool status gives: failed to query services: (13) Permission denied And results in the following output: health: UNKNOWN daemon health: UNKNOWN image health: OK images: 3 total 2 replaying 1 stopped So, this command: rbd --id rbd-mirror mirror pool status rbd So basically the health and daemon health cannot be obtained due to permission errors, but status about images can. When the command is run with admin permissions the health and daemon health are returned without issue. I tested this on Reef 18.2.2. Is this expected behavior? If not, I will create a tracker ticket for it. Gr. Stefan

2 weeks, 3 days

3
4
0 0

which grafana version to use with 17.2.x ceph version

by Osama Elswah

Hi, in quay.io I can find a lot of grafana versions for ceph (https://quay.io/repository/ceph/grafana?tab=tags) how can I find out which version should be used when I upgrade my cluster to 17.2.x ? Can I simply take the latest grafana version? Or is there a specfic grafana version I need to use?

2 weeks, 3 days

3
2
0 0

Impact of large PG splits

by Eugen Block

Hi, I'm trying to estimate the possible impact when large PGs are splitted. Here's one example of such a PG: PG_STAT OBJECTS BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG UP 86.3ff 277708 414403098409 0 0 3092 3092 [187,166,122,226,171,234,177,163,155,34,81,239,101,13,117,8,57,111] Their main application is RGW on EC (currently 1024 PGs on 240 OSDs), 8TB HDDs backed by SSDs. There are 6 RGWs running behind HAProxies. It took me a while to convince them to do a PG split and now they're trying to assess how big the impact could be. The fullest OSD is already at 85% usage, the least filled one at 59%, so there is definitely room for a better balancing which, will be necessary until the new hardware arrives. The current distribution is around 100 PGs per OSD which usually would be fine, but since the PGs are that large only a few PGs difference have a huge impact on the OSD utilization. I'm targeting 2048 PGs for that pool for now, probably do another split when the new hardware has been integrated. Any comments are appreciated! Eugen

2 weeks, 3 days

6
16
0 0

Re: MDS crash

by Alexey GERASIMOV

Colleagues, thank you for the advice to check the operability of MGRs. In fact, it is strange also: we checked our nodes for the network issues (ip connectivity, sockets, ACL, DNS) and find nothing wrong - but suddenly just the restart of all MGRs solved the problem with stale PGs and with ceph commands hang! So, we are at the start point again - ceph is working except MDS daemons crash. But now we see some additional errors in MDS logs when try to start the daemon: dir 0x1000dd10fa0 object missing on disk; some files may be lost (/volumes/csi/csi-vol-2eb40f89-f2e1-11ee-b657-3aa98da4c4a6/1080803d-1277-4ad8-ae80-a004bd3a5699/gallery/pc-12083932925583528732) dir 0x1000dd10f9d object missing on disk; some files may be lost (/volumes/csi/csi-vol-2eb40f89-f2e1-11ee-b657-3aa98da4c4a6/1080803d-1277-4ad8-ae80-a004bd3a5699/cadserver-filevault/project-files/661fb14d341d3746ea5c2a8f I promiced to create the bug, so will do it later a bit. But should I try to do something more from my side also? What I did exactly last time: cephfs-journal-tool journal reset cephfs-table-tool all reset session cephfs-data-scan scan_extents cephfs-data-scan scan_inodes cephfs-data-scan scan_links cephfs-data-scan cleanup And one more question: is it possible to access to cephfs content directly, without MDS?

2 weeks, 4 days

2
1
0 0

Re: MDS crash

by Alexey GERASIMOV

I don't know why, but I miss my topic when I reply to it. Moderators, please delete unnecessary topics and move my answer to the correct topic.

2 weeks, 5 days

1
0
0 0

Re: MDS crash

by Alexey GERASIMOV

Colleagues, thank you for the advice to check the operability of MGRs. In fact, it is strange also: we checked our nodes for the network issues (ip connectivity, sockets, ACL, DNS) and find nothing wrong - but suddenly just the restart of all MGRs solved the problem with stale PGs and with ceph commands hang! So, we are at the start point again - ceph is working except MDS daemons crash. But now we see some additional errors in MDS logs when try to start the daemon: dir 0x1000dd10fa0 object missing on disk; some files may be lost (/volumes/csi/csi-vol-2eb40f89-f2e1-11ee-b657-3aa98da4c4a6/1080803d-1277-4ad8-ae80-a004bd3a5699/gallery/pc-12083932925583528732) dir 0x1000dd10f9d object missing on disk; some files may be lost (/volumes/csi/csi-vol-2eb40f89-f2e1-11ee-b657-3aa98da4c4a6/1080803d-1277-4ad8-ae80-a004bd3a5699/cadserver-filevault/project-files/661fb14d341d3746ea5c2a8f I promiced to create the bug, so will do it later a bit. But should I try to do something more from my side also? What I did exactly last time: cephfs-journal-tool journal reset cephfs-table-tool all reset session cephfs-data-scan scan_extents cephfs-data-scan scan_inodes cephfs-data-scan scan_links cephfs-data-scan cleanup And one more question: is it possible to access to cephfs content directly, without MDS?

2 weeks, 5 days

1
0
0 0

rbd-mirror get status updates quicker

by Stefan Kooman

Hi, We're testing with rbd-mirror (mode snapshot) and try to get status updates about snapshots as fast a possible. We want to use rbd-mirror as a migration tool between two clusters and keep downtime during migration as short as possible. Therefore we have tuned the following parameters and set them to 1 second (default 30 seconds): rbd_mirror_pool_replayers_refresh_interval rbd_mirror_image_state_check_interval rbd_mirror_sync_point_update_age However, on the destination cluster, the "last_update:" field is only updated every 30 seconds. Is this tunable? Goal is to determine when the last snapshot that is made on the source has made it to the target and a demote (source) and promote (target) can be initiated. Gr. Stefan

2 weeks, 5 days

2
1
0 0

Re: Add node-exporter using ceph orch

by Robert Sander

On 4/26/24 15:47, Vahideh Alinouri wrote: > The result of this command shows one of the servers in the cluster, > but I have node-exporter daemons on all servers. The default service specification looks like this: service_type: node-exporter service_name: node-exporter placement: host_pattern: '*' If you apply this YAML code the orchestrator should deploy one node-exporter daemon to each host of the cluster. Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin

2 weeks, 6 days

1
0
0 0

Public Swift bucket with Openstack Keystone integration - not working in quincy/reef

by Bartosz Bezak

Hi, Similar case as with previously fixed https://tracker.ceph.com/issues/48382 - https://github.com/ceph/ceph/pull/47308. Confirmed on Cephadm deployed Ceph 18.2.2/17.2.7 with Openstack Antelope/Yoga. I’m getting "404 NoSuchBucket" error with public buckets. Enabled with Swift/Keystone integration - everything else works fine. With rgw_swift_account_in_url = true and proper endpoints: "https://rgw.test/swift/v1/AUTH_%(project_id)s" ticking public access in horizon properly sets ACL on the bucket according to swift client: swift -v stat test-bucket URL: https://rgw.test/swift/v1/AUTH_daksjhdkajdshda/testbucket Auth Token: Account: AUTH_daksjhdkajdshda Container: testbucket Objects: 1 Bytes: 1021036 Read ACL: .r:*,.rlistings Write ACL: Sync To: Sync Key: X-Timestamp: 1710947159.41219 X-Container-Bytes-Used-Actual: 1024000 X-Storage-Policy: default-placement X-Storage-Class: STANDARD Last-Modified: Thu, 21 Mar 2024 10:30:05 GMT X-Trans-Id: tx00000092ac12312312312-1231231231-1701e5-default X-Openstack-Request-Id: tx00000092ac12312312312-1231231231-1701e5-default Accept-Ranges: bytes Content-Type: text/plain; charset=utf-8 however still getting 404 NoSuchBucket error Could someone using the latest version of Ceph with Swift/Keystone integration please test public buckets? Thank you. Best regards, Bartosz Bezak

2 weeks, 6 days

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users