March 2024 - ceph-users - lists.ceph.io

by duluxoz

Hi All, I'm trying to mount a Ceph Reef (v18.2.2 - latest version) RBD Image as a 2nd HDD on a Rocky Linux v9.3 (latest version) host. The EC pool has been created and initialised and the image has been created. The ceph-common package has been installed on the host. The correct keyring has been added to the host (with a chmod of 600) and the host has been configure with an rbdmap file as follows: `my_pool.meta/my_image id=ceph_user,keyring=/etc/ceph/ceph.client.ceph_user.keyring`. When running the rbdmap.service the image appears as both `/dev/rbd0` and `/dev/rbd/my_pool.meta/my_image`, exactly as the Ceph Doco says it should. So everything *appears* AOK up to this point. My question now is: Should I run `mkfs xfs` on `/dev/rbd0` *before* or *after* I try to mount the image (via fstab: `/dev/rbd/my_pool.meta/my_image /mnt/my_image xfs noauto 0 0` - as per the Ceph doco)? The reason I ask is that I've tried this *both* ways and all I get is an error message (sorry, can't remember the exact messages and I'm not currently in front of the host to confirm it :-) - but from memory it was something about not being able to recognise the 1st block - or something like that). So, I'm obviously doing something wrong, but I can't work out what exactly (and the logs don't show any useful info). Do I, for instance, have the process wrong / don't understand the exact process, or is there something else wrong? All comments/suggestions/etc greatly appreciated - thanks in advance Cheers Dulux-Oz

1 month, 3 weeks

6
16
0 0

Best practice in 2024 for simple RGW failover

by E Taka

Hi, The requirements are actually not high: 1. there should be a generally known address for access. 2. it should be possible to reboot or shut down a server without the RGW connections being down the entire time. A downtime of a few seconds is OK. Constant load balancing would be nice, but is not necessary. I have found various approaches on the Internet - what is currently recommended for a current Ceph installation? Thanks,

1 month, 3 weeks

2
1
0 0

Ceph object gateway metrics

by Kushagr Gupta

Hi Team, I have been working on ceph to understand the full extent of its capability.*Ceph version*: 18.25 *OS*: Almalinux 8.8 I am working on a project which uses Ceph object gateway(Buckets). I want the metrics for users and buckets to be shown in prometheus. But when I checked the current metrics being sent to prometheus, there were no required metrics that showed the bucket/user usage of the objects. We are getting other metrics from ceph. Kindly refer to the image: [image: image.png] But the bucket and user required info is not present.(Kindly refer the second image) [image: image.png] I got the following links which indicates that a ceph exporter needs to be installed to expose those metrices. 1. https://github.com/SepehrImanian/s3-ceph-exporter 2. https://github.com/blemmenes/radosgw_usage_exporter 3. https://docs.ceph.com/en/latest/radosgw/metrics/ Could anyone please help me with this? Thanks and regards, Kushagra Gupta

1 month, 3 weeks

2
1
0 0

Quincy/Dashboard: Object Gateway not accessible after applying self-signed cert to rgw service

by stephan.budach＠jvm.com

Hi, I am running a Ceph cluster and configured RGW for S3, initially w/o SSL. The service works nicely and I updated the service usinfg SSL certs, signed by our own CA, just as I already did for the dashboard itself. However, as soon as I applied the new config, the dashboard wasn't able to access and display the service anymore, while the service itself still works, now using the supplied SSL certificate. The error supplied is: Error 500 The server encountered an unexpected condition which prevented it from fulfilling the request. My guess is, that the dashboard for some reason doesn't like the certificate, the rgw service is providing, despite the fact, that itself is using it. Any hints on how to make dashboard display the Object Gatway again?

1 month, 3 weeks

1
1
0 0

Why you might want packages not containers for Ceph deployments

by Matthew Vernon

Hi, In the discussion after the Ceph Month talks yesterday, there was a bit of chat about cephadm / containers / packages. IIRC, Sage observed that a common reason in the recent user survey for not using cephadm was that it only worked on containerised deployments. I think he then went on to say that he hadn't heard any compelling reasons why not to use containers, and suggested that resistance was essentially a user education question[0]. I'd like to suggest, briefly, that: * containerised deployments are more complex to manage, and this is not simply a matter of familiarity * reducing the complexity of systems makes admins' lives easier * the trade-off of the pros and cons of containers vs packages is not obvious, and will depend on deployment needs * Ceph users will benefit from both approaches being supported into the future We make extensive use of containers at Sanger, particularly for scientific workflows, and also for bundling some web apps (e.g. Grafana). We've also looked at a number of container runtimes (Docker, singularity, charliecloud). They do have advantages - it's easy to distribute a complex userland in a way that will run on (almost) any target distribution; rapid "cloud" deployment; some separation (via namespaces) of network/users/processes. For what I think of as a 'boring' Ceph deploy (i.e. install on a set of dedicated hardware and then run for a long time), I'm not sure any of these benefits are particularly relevant and/or compelling - Ceph upstream produce Ubuntu .debs and Canonical (via their Ubuntu Cloud Archive) provide .debs of a couple of different Ceph releases per Ubuntu LTS - meaning we can easily separate out OS upgrade from Ceph upgrade. And upgrading the Ceph packages _doesn't_ restart the daemons[1], meaning that we maintain control over restart order during an upgrade. And while we might briefly install packages from a PPA or similar to test a bugfix, we roll those (test-)cluster-wide, rather than trying to run a mixed set of versions on a single cluster - and I understand this single-version approach is best practice. Deployment via containers does bring complexity; some examples we've found at Sanger (not all Ceph-related, which we run from packages): * you now have 2 process supervision points - dockerd and systemd * docker updates (via distribution unattended-upgrades) have an unfortunate habit of rudely restarting everything * docker squats on a chunk of RFC 1918 space (and telling it not to can be a bore), which coincides with our internal network... * there is more friction if you need to look inside containers (particularly if you have a lot running on a host and are trying to find out what's going on) * you typically need to be root to build docker containers (unlike packages) * we already have package deployment infrastructure (which we'll need regardless of deployment choice) We also currently use systemd overrides to tweak some of the Ceph units (e.g. to do some network sanity checks before bringing up an OSD), and have some tools to pair OSD / journal / LVM / disk device up; I think these would be more fiddly in a containerised deployment. I'd accept that fixing these might just be a SMOP[2] on our part. Now none of this is show-stopping, and I am most definitely not saying "don't ship containers". But I think there is added complexity to your deployment from going the containers route, and that is not simply a "learn how to use containers" learning curve. I do think it is reasonable for an admin to want to reduce the complexity of what they're dealing with - after all, much of my job is trying to automate or simplify the management of complex systems! I can see from a software maintainer's point of view that just building one container and shipping it everywhere is easier than building packages for a number of different distributions (one of my other hats is a Debian developer, and I have a bunch of machinery for doing this sort of thing). But it would be a bit unfortunate if the general thrust of "let's make Ceph easier to set up and manage" was somewhat derailed with "you must use containers, even if they make your life harder". I'm not going to criticise anyone who decides to use a container-based deployment (and I'm sure there are plenty of setups where it's an obvious win), but if I were advising someone who wanted to set up and use a 'boring' Ceph cluster for the medium term, I'd still advise on using packages. I don't think this makes me a luddite :) Regards, and apologies for the wall of text, Matthew [0] I think that's a fair summary! [1] This hasn't always been true... [2] Simple (sic.) Matter of Programming -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

1 month, 3 weeks

26
65
0 0

Large number of misplaced PGs but little backfill going on

by Torkil Svensgaard

Hi We have this after adding some hosts and changing crush failure domain to datacenter: pgs: 1338512379/3162732055 objects misplaced (42.321%) 5970 active+remapped+backfill_wait 4853 active+clean 11 active+remapped+backfilling We have 3 datacenters each with 6 hosts and ~400 HDD OSDs with DB/WAL on NVMe. Using mclock with high_recovery_ops profile. What is the bottleneck here? I would have expected a huge number of simultaneous backfills. Backfill reservation logjam? Mvh. Torkil -- Torkil Svensgaard Systems Administrator Danish Research Centre for Magnetic Resonance DRCMR, Section 714 Copenhagen University Hospital Amager and Hvidovre Kettegaard Allé 30, 2650 Hvidovre, Denmark

1 month, 3 weeks

5
25
0 0

Spam in log file

by Albert Shih

Hi everyone. On my cluster I got spam by my cluster with message like Mar 25 13:10:13 cthulhu2 ceph-mgr[2843]: mgr finish mon failed to return metadata for mds.cephfs.cthulhu2.dqahyt: (2) No such file or directory Mar 25 13:10:13 cthulhu2 ceph-mgr[2843]: mgr finish mon failed to return metadata for mds.cephfs.cthulhu3.xvboir: (2) No such file or directory Mar 25 13:10:13 cthulhu2 ceph-mgr[2843]: mgr finish mon failed to return metadata for mds.cephfs.cthulhu5.kwmyyg: (2) No such file or directory I got 5 server for the service (cthulhu 1->5) and indeed when from cthulhu1 (or 2) I try : something: root@cthulhu2:/etc/ceph# ceph mds metadata cephfs.cthulhu2.dqahyt {} Error ENOENT: root@cthulhu2: but that works on 1 or 4 root@cthulhu2:/etc/ceph# ceph mds metadata cephfs.cthulhu1.sikvjf { "addr": "[v2:145.238.187.184:6800/1315478297,v1:145.238.187.184:6801/1315478297]", "arch": "x86_64", "ceph_release": "quincy", "ceph_version": "ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)", "ceph_version_short": "17.2.7", "container_hostname": "cthulhu1", "container_image": "quay.io/ceph/ceph@sha256:62465e744a80832bde6a57120d3ba076613e8a19884b274f9cc82580e249f6e1", "cpu": "Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz", "distro": "centos", "distro_description": "CentOS Stream 8", "distro_version": "8", "hostname": "cthulhu1", "kernel_description": "#1 SMP Debian 5.10.209-2 (2024-01-31)", "kernel_version": "5.10.0-28-amd64", "mem_swap_kb": "16777212", "mem_total_kb": "263803496", "os": "Linux" } root@cthulhu2:/etc/ceph# I check the caps and don't see anything special. I got also (I don't know if it's related) those message : Mar 25 13:18:38 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open from mds.cephfs.cthulhu2.dqahyt v2:145.238.187.185:6800/2763465960; not ready for session (expect reconnect) Mar 25 13:18:38 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open from mds.cephfs.cthulhu3.xvboir v2:145.238.187.186:6800/1297104944; not ready for session (expect reconnect) Mar 25 13:18:38 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open from mds.cephfs.cthulhu5.kwmyyg v2:145.238.187.188:6800/449122091; not ready for session (expect reconnect) Mar 25 13:18:39 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open from mds.cephfs.cthulhu3.xvboir v2:145.238.187.186:6800/1297104944; not ready for session (expect reconnect) Mar 25 13:18:39 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open from mds.cephfs.cthulhu2.dqahyt v2:145.238.187.185:6800/2763465960; not ready for session (expect reconnect) Mar 25 13:18:39 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open from mds.cephfs.cthulhu5.kwmyyg v2:145.238.187.188:6800/449122091; not ready for session (expect reconnect) Regards. -- Albert SHIH 🦫 🐸 France Heure locale/Local time: lun. 25 mars 2024 13:08:33 CET

1 month, 3 weeks

2
3
0 0

March Ceph Science Virtual User Group

by Kevin Hrpcek

Hey All, We will be having a Ceph science/research/big cluster call on Wednesday March 27th. If anyone wants to discuss something specific they can add it to the pad linked below. If you have questions or comments you can contact me. This is an informal open call of community members mostly from hpc/htc/research/big cluster environments (though anyone is welcome) where we discuss whatever is on our minds regarding ceph. Updates, outages, features, maintenance, etc...there is no set presenter but I do attempt to keep the conversation lively. Pad URL: https://pad.ceph.com/p/Ceph_Science_User_Group_20240327 Virtual event details: March 27, 2024 14:00 UTC 3pm Central European 9am Central US Main pad for discussions: https://pad.ceph.com/p/Ceph_Science_User_Group_Index Meetings will be recorded and posted to the Ceph Youtube channel. To join the meeting on a computer or mobile phone: https://meet.jit.si/ceph-science-wg Kevin -- Kevin Hrpcek NASA VIIRS Atmosphere SIPS/TROPICS Space Science & Engineering Center University of Wisconsin-Madison

1 month, 3 weeks

1
0
0 0

ceph RGW reply "ERROR: S3 error: 404 (NoSuchKey)" but rgw object metadata exist

by xuchenhuig＠gmail.com

Hi, My ceph cluster has 9 nodes for Ceph Object Store. Recently, I have experienced data loss that reply 404 (NoSuchKey) by s3cmd get xxx command. However, I can get metadata info by s3cmd ls xxx. The RGW object size is above 1GB that have many multipart object. Commanding 'rados -p default.rgw.buckets.data stats object' show that it only have head object, all of multipart and shadow part have gone. The bucket data only support write and read operation, no delete, and has no lifecycle policy. I have found similar problem in https://tracker.ceph.com/issues/47866 that had repaired in v16.0.0. Maybe this is new data loss problem that very serious for us. ceph version: 16.2.5 #command info: s3cmd ls s3://solr-scrapy.commoncrawl-warc/batch_2024031314/Scrapy/main/CC-MAIN-20200118052321-20200118080321-00547.warc.gz 2024-03-13 09:27 1208269953 s3://solr-scrapy.commoncrawl-warc/batch_2024031314/Scrapy/main/CC-MAIN-20200118052321-20200118080321-00547.warc.gz s3cmd get s3://solr-scrapy.commoncrawl-warc/batch_2024031314/Scrapy/main/CC-MAIN-20200118052321-20200118080321-00547.warc.gz download: 's3://solr-scrapy.commoncrawl-warc/batch_2024031314/Scrapy/main/CC-MAIN-20200118052321-20200118080321-00547.warc.gz' -> './CC-MAIN-20200118052321-20200118080321- 00547.warc.gz' [1 of 1] ERROR: Download of './CC-MAIN-20200118052321-20200118080321-00547.warc.gz' failed (Reason: 404 (NoSuchKey)) ERROR: S3 error: 404 (NoSuchKey) # head exist and size is 0, multipart and shadow had lost rados -p default.rgw.buckets.data stat df8c0fe6-01c8-4c07-b310-2d102356c004.76248.1__multipart_batch_2024031314/Scrapy/main/CC-MAIN-20200118052321-20200118080321-00547.warc.gz.2~C2M72EJLHrNe_fnHnifS4N7pw70hVmE.1 error stat-ing eck6m2.rgw.buckets.data/df8c0fe6-01c8-4c07-b310-2d102356c004.76248.1__multipart_batch_2024031314/Scrapy/main/CC-MAIN-20200118052321-20200118080321-00547.warc.gz.2~C2M72EJLHrNe_fnHnifS4N7pw70hVmE.1: (2) No such file or directory thanks.

1 month, 3 weeks

1
0
0 0

ceph cluster extremely unbalanced

by Denis Polom

Hi guys, recently I took over a care of Ceph cluster that is extremely unbalanced. Cluster is running on Quincy 17.2.7 (upgraded Nautilus -> Octopus -> Quincy) and has 1428 OSDs (HDDs). We are running CephFS on it. Crush failure domain is datacenter (there are 3), data pool is EC 3+3. This cluster had and has balancer disabled for years. And was "balanced" manually by changing OSDs crush weights. So now it is complete mess and I would like to change it to have OSDs crush weight same (3.63898) and to enable balancer with upmap. From `ceph osd df ` sorted from the least used to most used OSDs: ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS MIN/MAX VAR: 0.76/1.16 STDDEV: 5.97 TOTAL 5.1 PiB 3.7 PiB 3.7 PiB 2.9 MiB 8.5 TiB 1.5 PiB 71.50 428 hdd 3.63898 1.00000 3.6 TiB 2.0 TiB 2.0 TiB 1 KiB 5.6 GiB 1.7 TiB 54.55 0.76 96 up 223 hdd 3.63898 1.00000 3.6 TiB 2.0 TiB 2.0 TiB 3 KiB 5.6 GiB 1.7 TiB 54.58 0.76 95 up ... ... ... 591 hdd 3.53999 1.00000 3.6 TiB 3.0 TiB 3.0 TiB 1 KiB 7.0 GiB 680 GiB 81.74 1.14 125 up 832 hdd 3.59999 1.00000 3.6 TiB 3.0 TiB 3.0 TiB 4 KiB 6.9 GiB 680 GiB 81.75 1.14 114 up 248 hdd 3.63898 1.00000 3.6 TiB 3.0 TiB 3.0 TiB 3 KiB 7.2 GiB 646 GiB 82.67 1.16 121 up 559 hdd 3.63799 1.00000 3.6 TiB 3.0 TiB 3.0 TiB 0 B 7.0 GiB 644 GiB 82.70 1.16 123 up TOTAL 5.1 PiB 3.7 PiB 3.6 PiB 2.9 MiB 8.5 TiB 1.5 PiB 71.50 MIN/MAX VAR: 0.76/1.16 STDDEV: 5.97 crush rule: { "rule_id": 10, "rule_name": "ec33hdd_rule", "type": 3, "steps": [ { "op": "set_chooseleaf_tries", "num": 5 }, { "op": "set_choose_tries", "num": 100 }, { "op": "take", "item": -2, "item_name": "default~hdd" }, { "op": "choose_indep", "num": 3, "type": "datacenter" }, { "op": "choose_indep", "num": 2, "type": "osd" }, { "op": "emit" } ] } My question is what would be proper and most safer way to make it happen. * should I first enable balancer and let it do its work and after that change the OSDs crush weights to be even? * or should it otherwise - first to make crush weights even and then enable the balancer? * or is there another safe(r) way? What are the ideal balancer settings for that? I'm expecting a large data movement, and this is production cluster. I'm also afraid that during the balancing or changing crush weights some OSDs become full. I've tried that already and had to move some PGs manually to another OSDs in the same failure domain. I would appreciate any suggestion on that. Thank you!

1 month, 3 weeks

3
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2024