January 2024 - ceph-users

cephadm discovery service certificate absent after upgrade.

by Nicolas FOURNIL

Hello, I've just fresh upgrade from Quincy to Reef and my graphs are now blank... after investigations, it seems that discovery service is not working because of no certificate : # ceph orch sd dump cert Error EINVAL: No certificate found for service discovery Maybe an upgrade issue ? Is there a way to generate or replace the certificate properly ? Regards Nicolas F.

3 months, 3 weeks

2
7
0 0

TLS 1.2 for dashboard

by Sake Ceph

After upgrading to 17.2.7 our load balancers can't check the status of the manager nodes for the dashboard. After some troubleshooting I noticed only TLS 1.3 is availalbe for the dashboard. Looking at the source (quincy), TLS config got changed from 1.2 to 1.3. Searching in the tracker I found out that we are not the only one with troubles and there will be added an option to the dashboard config. Tracker ID 62940 got backports and the ones for reef and pacific already merged. But the pull request (63068) for Quincy is closed :( What to do? I hope this one can get merged for 17.2.8.

3 months, 3 weeks

2
5
0 0

Stupid question about ceph fs volume

by Albert Shih

Hi everyone, Stupid question about ceph fs volume create how can I specify the metadata pool and the data pool ? I was able to create a cephfs «manually» with something like ceph fs new vo cephfs_metadata cephfs_data but as I understand the documentation, with this method I need to deploy the mds, and the «new» way to do it is to use ceph fs volume. But with ceph fs volume I didn't find any documentation of how to set the metadata/data pool I also didn't find any way to change after the creation of the volume the pool. Thanks -- Albert SHIH 🦫 🐸 France Heure locale/Local time: mer. 24 janv. 2024 19:24:23 CET

3 months, 3 weeks

3
8
0 0

Ceph 16.2.14: ceph-mgr getting oom-killed

by Zakhar Kirpichenko

Hi, I'm facing a rather new issue with our Ceph cluster: from time to time ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over 100 GB RAM: [Nov21 15:02] tp_osd_tp invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ +0.000010] oom_kill_process.cold+0xb/0x10 [ +0.000002] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ +0.000008] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167 [ +0.000697] Out of memory: Killed process 3941610 (ceph-mgr) total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, shmem-rss:0kB, UID:167 pgtables:260356kB oom_score_adj:0 [ +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB The cluster is stable and operating normally, there's nothing unusual going on before, during or after the kill, thus it's unclear what causes the mgr to balloon, use all RAM and get killed. Systemd logs aren't very helpful: they just show normal mgr operations until it fails to allocate memory and gets killed: https://pastebin.com/MLyw9iVi The mgr experienced this issue several times in the last 2 months, and the events don't appear to correlate with any other events in the cluster because basically nothing else happened at around those times. How can I investigate this and figure out what's causing the mgr to consume all memory and get killed? I would very much appreciate any advice! Best regards, Zakhar

3 months, 3 weeks

6
21
0 0

cephfs-top causes 16 mgr modules have recently crashed

by Özkan Göksu

Hello When I run cephfs-top it causes mgr module crash. Can you please tell me the reason? My environment: My ceph version 17.2.6 Operating System: Ubuntu 22.04.2 LTS Kernel: Linux 5.15.0-84-generic I created the cephfs-top user with the following command: ceph auth get-or-create client.fstop mon 'allow r' mds 'allow r' osd 'allow r' mgr 'allow r' > /etc/ceph/ceph.client.fstop.keyring This is the crash report: root@ud-01:~# ceph crash info 2024-01-22T21:25:59.313305Z_526253e3-e8cc-4d2c-adcb-69a7c9986801 { "backtrace": [ " File \"/usr/share/ceph/mgr/stats/module.py\", line 32, in notify\n self.fs_perf_stats.notify_cmd(notify_id)", " File \"/usr/share/ceph/mgr/stats/fs/perf_stats.py\", line 177, in notify_cmd\n metric_features = int(metadata[CLIENT_METADATA_KEY][\"metric_spec\"][\"metric_flags\"][\"feature_bits\"], 16)", "ValueError: invalid literal for int() with base 16: '0x'" ], "ceph_version": "17.2.6", "crash_id": "2024-01-22T21:25:59.313305Z_526253e3-e8cc-4d2c-adcb-69a7c9986801", "entity_name": "mgr.ud-01.qycnol", "mgr_module": "stats", "mgr_module_caller": "ActivePyModule::notify", "mgr_python_exception": "ValueError", "os_id": "centos", "os_name": "CentOS Stream", "os_version": "8", "os_version_id": "8", "process_name": "ceph-mgr", "stack_sig": "971ae170f1fff7f7bc0b7ae86d164b2b0136a8bd5ca7956166ea5161e51ad42c", "timestamp": "2024-01-22T21:25:59.313305Z", "utsname_hostname": "ud-01", "utsname_machine": "x86_64", "utsname_release": "5.15.0-84-generic", "utsname_sysname": "Linux", "utsname_version": "#93-Ubuntu SMP Tue Sep 5 17:16:10 UTC 2023" } Best regards.

3 months, 3 weeks

2
4
0 0

[DOC] Openstack with RBD DOC update?

by Murilo Morais

Good afternoon everybody! I have a question regarding the documentation... I was reviewing it and realized that the "vms" pool is not being used anywhere in the configs. The first mention of this pool was in commit 2eab1c1 and, in e9b13fa, the configuration section of nova.conf was removed, but the pool configuration remained there. Would it be correct to ignore all mentions of this pool (I don't see any use for it)? If so, it would be interesting to update the documentation. https://docs.ceph.com/en/latest/rbd/rbd-openstack/#create-a-pool

3 months, 3 weeks

5
7
0 0

CLT meeting notes January 24th 2024

by Adam King

- Build/package PRs- who to best review these? - Example: https://github.com/ceph/ceph/pull/55218 - Idea: create a GitHub team specifically for these types of PRs https://github.com/orgs/ceph/teams - Laura will try to organize people for the group - Pacific 16.2.15 status - Handful of PRs left in 16.2.15 tag https://github.com/ceph/ceph/pulls?q=is%3Apr+is%3Aopen+milestone%3Av16.2.15 that still need to be tested and merged - Yuri will begin testing RC after that

3 months, 3 weeks

1
0
0 0

Degraded PGs on EC pool when marking an OSD out

by Hector Martin

I'm having a bit of a weird issue with cluster rebalances with a new EC pool. I have a 3-machine cluster, each machine with 4 HDD OSDs (+1 SSD). Until now I've been using an erasure coded k=5 m=3 pool for most of my data. I've recently started to migrate to a k=5 m=4 pool, so I can configure the CRUSH rule to guarantee that data remains available if a whole host goes down (3 chunks per host, 9 total). I also moved the 5,3 pool to this setup, although by nature I know its PGs will become inactive if a host goes down (need at least k+1 OSDs to be up). I've only just started migrating data to the 5,4 pool, but I've noticed that any time I trigger any kind of backfilling (e.g. take one OSD out), a bunch of PGs in the 5,4 pool become degraded (instead of just misplaced/backfilling). This always seems to happen on that pool only, and the object count is a significant fraction of the total pool object count (it's not just "a few recently written objects while PGs were repeering" or anything like that, I know about that effect). Here are the pools: pool 13 'cephfs2_data_hec5.3' erasure profile ec5.3 size 8 min_size 6 crush_rule 7 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 14133 lfor 0/11307/11305 flags hashpspool,ec_overwrites,bulk stripe_width 20480 application cephfs pool 14 'cephfs2_data_hec5.4' erasure profile ec5.4 size 9 min_size 6 crush_rule 7 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 14509 lfor 0/0/14234 flags hashpspool,ec_overwrites,bulk stripe_width 20480 application cephfs EC profiles: # ceph osd erasure-code-profile get ec5.3 crush-device-class= crush-failure-domain=osd crush-root=default jerasure-per-chunk-alignment=false k=5 m=3 plugin=jerasure technique=reed_sol_van w=8 # ceph osd erasure-code-profile get ec5.4 crush-device-class= crush-failure-domain=osd crush-root=default jerasure-per-chunk-alignment=false k=5 m=4 plugin=jerasure technique=reed_sol_van w=8 They both use the same CRUSH rule, which is designed to select 9 OSDs balanced across the hosts (of which only 8 slots get used for the older 5,3 pool): rule hdd-ec-x3 { id 7 type erasure step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step choose indep 3 type host step choose indep 3 type osd step emit } If I take out an OSD (14), I get something like this: health: HEALTH_WARN Degraded data redundancy: 37631/120155160 objects degraded (0.031%), 38 pgs degraded All the degraded PGs are in the 5,4 pool, and the total object count is around 50k, so this is *most* of the data in the pool becoming degraded just because I marked an OSD out (without stopping it). If I mark the OSD in again, the degraded state goes away. Example degraded PGs: # ceph pg dump | grep degraded dumped all 14.3c 812 0 838 0 0 11925027758 0 0 1088 0 1088 active+recovery_wait+undersized+degraded+remapped 2024-01-19T18:06:41.786745+0900 15440'1088 15486:10772 [18,17,16,1,3,2,11,13,12] 18 [18,17,16,1,3,2,11,NONE,12] 18 14537'432 2024-01-12T11:25:54.168048+0900 0'0 2024-01-08T15:18:21.654679+0900 0 2 periodic scrub scheduled @ 2024-01-21T08:00:23.572904+0900 241 0 14.3d 772 0 1602 0 0 11303280223 0 0 1283 0 1283 active+recovery_wait+undersized+degraded+remapped 2024-01-19T18:06:41.919971+0900 15470'1283 15486:13384 [18,17,16,3,1,0,13,11,12] 18 [18,17,16,3,1,0,NONE,NONE,12] 18 14990'771 2024-01-15T12:15:59.397469+0900 0'0 2024-01-08T15:18:21.654679+0900 0 3 periodic scrub scheduled @ 2024-01-23T15:56:58.912801+0900 534 0 14.3e 806 0 832 0 0 11843019697 0 0 1035 0 1035 active+recovery_wait+undersized+degraded+remapped 2024-01-19T18:06:42.297251+0900 15465'1035 15486:15423 [18,16,17,12,13,11,1,3,0] 18 [18,16,17,12,13,NONE,1,3,0] 18 14623'500 2024-01-13T08:54:55.709717+0900 0'0 2024-01-08T15:18:21.654679+0900 0 1 periodic scrub scheduled @ 2024-01-22T09:54:51.278368+0900 331 0 14.3f 782 0 813 0 0 11598393034 0 0 1083 0 1083 active+recovery_wait+undersized+degraded+remapped 2024-01-19T18:06:41.845173+0900 15465'1083 15486:18496 [17,18,16,3,0,1,11,12,13] 17 [17,18,16,3,0,1,11,NONE,13] 17 14990'800 2024-01-15T16:42:08.037844+0900 14990'800 2024-01-15T16:42:08.037844+0900 0 40 periodic scrub scheduled @ 2024-01-23T10:44:06.083985+0900 563 0 The first PG when I put the OSD back in: 14.3c 812 0 0 0 0 11925027758 0 0 1088 0 1088 active+clean 2024-01-19T18:07:18.079295+0900 15440'1088 15489:10792 [18,17,16,1,3,2,11,14,12] 18 [18,17,16,1,3,2,11,14,12] 18 14537'432 2024-01-12T11:25:54.168048+0900 0'0 2024-01-08T15:18:21.654679+0900 0 2 periodic scrub scheduled @ 2024-01-21T09:41:43.026836+0900 241 0 As far as I know PGs are not supposed to actually become *degraded* when merely moving data around without any OSDs going down. Am I doing something wrong here? Any idea why this is affecting one pool and not both, even though they are almost identical in setup? It's as if, for this one pool, marking an OSD out has the effect of making its data unavailable entirely, instead of merely backfill to other OSDs (the OSD shows up as NONE in the above dump). OSD tree: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 89.13765 root default -13 29.76414 host flamingo 11 hdd 7.27739 osd.11 up 1.00000 1.00000 12 hdd 7.27739 osd.12 up 1.00000 1.00000 13 hdd 7.27739 osd.13 up 1.00000 1.00000 14 hdd 7.20000 osd.14 up 1.00000 1.00000 8 ssd 0.73198 osd.8 up 1.00000 1.00000 -10 29.84154 host heart 0 hdd 7.27739 osd.0 up 1.00000 1.00000 1 hdd 7.27739 osd.1 up 1.00000 1.00000 2 hdd 7.27739 osd.2 up 1.00000 1.00000 3 hdd 7.27739 osd.3 up 1.00000 1.00000 9 ssd 0.73198 osd.9 up 1.00000 1.00000 -3 0 host hub -7 29.53197 host soleil 15 hdd 7.20000 osd.15 up 0 1.00000 16 hdd 7.20000 osd.16 up 1.00000 1.00000 17 hdd 7.20000 osd.17 up 1.00000 1.00000 18 hdd 7.20000 osd.18 up 1.00000 1.00000 10 ssd 0.73198 osd.10 up 1.00000 1.00000 (I'm in the middle of doing some reprovisioning so 15 is out, this happens any time I take any OSD out) # ceph --version ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable) - Hector

3 months, 3 weeks

3
4
0 0

cephx client key rotation

by Peter Sabaini

Hi, this question has come up once in the past[0] afaict, but it was kind of inconclusive so I'm taking the liberty of bringing it up again. I'm looking into implementing a key rotation scheme for Ceph client keys. As it potentially takes some non-zero amount of time to update key material there might be a situation where keys have changed on the MON side but, still one of N clients might not have updated key material and try to auth with an obsolete key which naturally would fail. It would be great if we could have two keys active for an entity at the same time, but aiui that's not really possible, is that right? I'm wondering about ceph auth get-or-create-pending. Per the docs a pending key would become active on first use, so that if one of N clients uses it, this still leaves room for another client to race. What do people do to deal with this situation? [0] https://ceph-users.ceph.narkive.com/ObSMdmxX/rotating-cephx-keys

3 months, 3 weeks

1
0
0 0

Cephadm orchestrator and special label _admin in 17.2.7

by Robert Sander

Hi, According to the documentation¹ the special host label _admin instructs the cephadm orchestrator to place a valid ceph.conf and the ceph.client.admin.keyring into /etc/ceph of the host. I noticed that (at least) on 17.2.7 only the keyring file is placed in /etc/ceph, but not ceph.conf. Both files are placed into the /var/lib/ceph/<fsid>/config directory. Has something changed? ¹: https://docs.ceph.com/en/quincy/cephadm/host-management/#special-host-labels Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin

3 months, 3 weeks

4
8
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2024