February 2024 - ceph-users

by Yuri Weinstein

Details of this release are summarized here: https://tracker.ceph.com/issues/64151#note-1 Seeking approvals/reviews for: rados - Radek, Laura, Travis, Ernesto, Adam King rgw - Casey fs - Venky rbd - Ilya krbd - in progress upgrade/nautilus-x (pacific) - Casey PTL (regweed tests failed) upgrade/octopus-x (pacific) - Casey PTL (regweed tests failed) upgrade/pacific-x (quincy) - in progress upgrade/pacific-p2p - Ilya PTL (maybe rbd related?) ceph-volume - Guillaume TIA YuriW

2 months, 3 weeks

12
27
0 0

Seperate metadata pool in 3x MDS node

by Özkan Göksu

Hello folks! I'm designing a new Ceph storage from scratch and I want to increase CephFS speed and decrease latency. Usually I always build (WAL+DB on NVME with Sas-Sata SSD's) and I deploy MDS and MON's on the same servers. This time a weird idea came to my mind and I think it has great potential and will perform better on paper with my limited knowledge. I have 5 racks and the 3nd "middle" rack is my storage and management rack. - At RACK-3 I'm gonna locate 8x 1u OSD server (Spec: 2x E5-2690V4, 256GB, 4x 25G, 2x 1.6TB PCI-E NVME "MZ-PLK3T20", 8x 4TB SATA SSD) - My Cephfs kernel clients are 40x GPU nodes located at RACK1,2,4,5 With my current workflow, all the clients; 1- visit the rack data switch 2- jump to main VPC switch via 2x100G, 3- talk with MDS servers, 4- Go back to the client with the answer, 5- To access data follow the same HOP's and visit the OSD's everytime. If I deploy separate metadata pool by using 4x MDS server at top of RACK-1,2,4,5 (Spec: 2x E5-2690V4, 128GB, 2x 10G(Public), 2x 25G (cluster), 2x 960GB U.2 NVME "MZ-PLK3T20") Then all the clients will make the request directly in-rack 1 HOP away MDS servers and if the request is only metadata, then the MDS node doesn't need to redirect the request to OSD nodes. Also by locating MDS servers with seperated metadata pool across all the racks will reduce the high load on main VPC switch at RACK-3 If I'm not missing anything then only Recovery workload will suffer with this topology. What do you think?

2 months, 3 weeks

3
3
0 0

Cephadm and Ceph.conf

by Michael Worsham

I deployed a Ceph reef cluster using cephadm. When it comes to the ceph.conf file, which file should I be editing for making changes to the cluster - the one running under the docker container or the local one on the Ceph monitors? -- Michael This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and permanently delete the original email and destroy any copies or printouts of this email as well as any attachments.

2 months, 3 weeks

2
3
0 0

ambigous mds behind on trimming and slowops (ceph 17.2.5 and rook operator 1.10.8)

by a.warkhade98＠gmail.com

Team, Guys, We were facing cephFs volume mount issue and ceph status it was showing mds slow requests Mds behind on trimming After restarting mds pods it was resolved But wanted to know Root caus of this It was started after 2 hours of one of the active mds was crashed So does that an active mds crash can cause this issue ? Please provide your inputs anyone

2 months, 3 weeks

2
2
0 0

What exactly does the osd pool repair funtion do?

by Aleksander Pähn

What exactly does the osd pool repair function do? Documentation is not clear. Kind regards, AP This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.

2 months, 3 weeks

2
1
0 0

Is a direct Octopus to Reef Upgrade Possible?

by Alex Hussein-Kershaw (HE/HIM)

Hi ceph-users, I currently use Ceph Octopus to provide CephFS & S3 Storage for our app servers, deployed in containers by ceph-ansible. I'm planning to take an upgrade to get off Ceph Octopus as it's EOL. I'd love to go straight to reef, but vaguely remember reading a statement that only two major versions can be taken on upgrade. I've failed to find that statement again. Is it possible to go directly from Octopus straight to Reef? I think a sensible approach here is to first migrate our existing deployments to use cephadm, and then use cephadm to upgrade. Any advice on this very welcome. Many thanks, Alex

2 months, 3 weeks

2
1
0 0

Ceph MDS randomly hangs when pg nums reduced

by lokitingyi＠gmail.com

Hi, I have a CephFS cluster ``` > ceph -s cluster: id: e78987f2-ef1c-11ed-897d-cf8c255417f0 health: HEALTH_WARN 85 pgs not deep-scrubbed in time 85 pgs not scrubbed in time services: mon: 5 daemons, quorum datastone05,datastone06,datastone07,datastone10,datastone09 (age 2w) mgr: datastone05.iitngk(active, since 2w), standbys: datastone06.wjppdy mds: 2/2 daemons up, 1 hot standby osd: 22 osds: 22 up (since 3d), 22 in (since 4w); 8 remapped pgs data: volumes: 1/1 healthy pools: 4 pools, 115 pgs objects: 49.08M objects, 16 TiB usage: 35 TiB used, 2.0 PiB / 2.1 PiB avail pgs: 3807933/98160678 objects misplaced (3.879%) 107 active+clean 8 active+remapped+backfilling io: client: 224 MiB/s rd, 79 MiB/s wr, 844 op/s rd, 33 op/s wr recovery: 8.8 MiB/s, 24 objects/s ``` The pool and pg status ``` > ceph osd pool autoscale-status POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK cephfs.myfs.meta 28802M 2.0 2119T 0.0000 4.0 16 on False cephfs.myfs.data 16743G 2.0 2119T 0.0154 1.0 32 on False rbd 19 2.0 2119T 0.0000 1.0 32 on False .mgr 3840k 2.0 2119T 0.0000 1.0 1 on False ``` The pool detail ``` > ceph osd pool ls detail pool 1 'cephfs.myfs.meta' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 3639 lfor 0/3639/3637 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs pool 2 'cephfs.myfs.data' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 66 pgp_num 58 pg_num_target 32 pgp_num_target 32 autoscale_mode on last_change 5670 lfor 0/5661/5659 flags hashpspool,selfmanaged_snaps stripe_width 0 application cephfs pool 3 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 486 lfor 0/486/478 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 4 '.mgr' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 39 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr ``` When pg numbers reduce, the mds server would have a chance to hang.

2 months, 3 weeks

2
1
0 0

ceph commands on host cannot connect to cluster after cephx disabling

by service.plant＠ya.ru

Hello everybody, Suddenly faced with a problem with (probably) authorization playing with cephx. So, long story short: 1) Rollout completely new testing cluster by cephadm with only one node 2) According to docs I've set this to /etc/ceph/ceph.conf auth_cluster_required = none auth_service_required = none auth_client_required = none 3) restart ceph.target 4) now even "ceph -s " cannot connect to RADOS saying root@ceph1:/etc/ceph# ceph -s 2024-02-24T18:15:59.219+0000 7f7c10d65700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] 2024-02-24T18:15:59.219+0000 7f7c11d67700 0 librados: client.admin authentication error (13) Permission denied [errno 13] RADOS permission denied (error connecting to the cluster) 4) I have ceph.client.admin.keyring in both /etc/ceph and /var/lib/ceph/$fsid/config 5) logs of monitor doesnt show any error. It looks like it keeps normal living and even doesn't know that something goes wrong 6) Tried to set back /etc/ceph/ceph.conf to auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx with no success 7) I have noted that some process (I guess it is one of processes in containers?) always rewrite /etc/ceph/ceph.cong and /var/lib/ceph/$fsid/config/ceph.conf whatever I woud write there. What is the process? Who is it? How to set up settings if I want to keep it in the file?? Ubuntu 20.04, Reef, 18.0.2 Thanks in advance.

2 months, 3 weeks

1
0
0 0

Scrubs Randomly Starting/Stopping

by ashley＠amerrick.co.uk

Have just upgraded a cluster from 17.2.7 to 18.2.1 Everything is working as expected apart from the amount of scrubs & deep scrubs is bouncing all over the place every second. I have the value set to 1 per OSD but currently the cluster reckons one minute it’s doing 60+ scrubs, and then second this will drop to 40, then back to 70. If I check the ceph live log’s I can see every second it’s reporting multiple PG’s starting either a scrub or deep scrub, it does not look like these are actually running as isn’t having a negative effect on the cluster’s performance. Is this something to be expected off the back of the upgrade and should sort it self out? A sample of the logs: 2024-02-24T00:41:20.055401+0000 osd.54 (osd.54) 3160 : cluster 0 12.9a deep-scrub starts 2024-02-24T00:41:19.658144+0000 osd.41 (osd.41) 4103 : cluster 0 12.cd deep-scrub starts 2024-02-24T00:41:19.823910+0000 osd.33 (osd.33) 5625 : cluster 0 12.ae deep-scrub starts 2024-02-24T00:41:19.846736+0000 osd.65 (osd.65) 3947 : cluster 0 12.53 deep-scrub starts 2024-02-24T00:41:20.007331+0000 osd.20 (osd.20) 7214 : cluster 0 12.142 scrub starts 2024-02-24T00:41:20.114748+0000 osd.10 (osd.10) 6538 : cluster 0 12.2c deep-scrub starts 2024-02-24T00:41:20.247205+0000 osd.36 (osd.36) 4789 : cluster 0 12.16f deep-scrub starts 2024-02-24T00:41:20.908051+0000 osd.68 (osd.68) 3869 : cluster 0 12.d7 deep-scrub starts

2 months, 3 weeks

2
1
1 0

Size return by df

by Albert Shih

Hi, I got one cephfs with one volume and subvolumes with a erasure coding. If I don't set any quota when I run df on the client I got 0ccbc438-d109-4c5f-b47b-70f8df707c2c/vo 5,8P 78T 5,8P 2% /vo The 78T seem to be the size use by ceph on disk (on the hardware I mean). And I find that very good. But If I set a quota setfattr -n ceph.quota.max_bytes -v 109951162777600 vo then on the same client I got 0ccbc438-d109-4c5f-b47b-70f8df707c2c/vo 100T 51T 50T 51% /vo and that are the size of the data (I using erasure 4/2 so 51*1.5 ~ 77 To) Is they are any way to keep the first answer ? Regards -- Albert SHIH 🦫 🐸 France Heure locale/Local time: jeu. 22 févr. 2024 08:44:17 CET

2 months, 3 weeks

2
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users February 2024