Hello,
I've just fresh upgrade from Quincy to Reef and my graphs are now blank...
after investigations, it seems that discovery service is not working
because of no certificate :
# ceph orch sd dump cert
Error EINVAL: No certificate found for service discovery
Maybe an upgrade issue ?
Is there a way to generate or replace the certificate properly ?
Regards
Nicolas F.
After upgrading to 17.2.7 our load balancers can't check the status of the manager nodes for the dashboard. After some troubleshooting I noticed only TLS 1.3 is availalbe for the dashboard.
Looking at the source (quincy), TLS config got changed from 1.2 to 1.3. Searching in the tracker I found out that we are not the only one with troubles and there will be added an option to the dashboard config. Tracker ID 62940 got backports and the ones for reef and pacific already merged. But the pull request (63068) for Quincy is closed :(
What to do? I hope this one can get merged for 17.2.8.
Hi everyone,
Stupid question about
ceph fs volume create
how can I specify the metadata pool and the data pool ?
I was able to create a cephfs «manually» with something like
ceph fs new vo cephfs_metadata cephfs_data
but as I understand the documentation, with this method I need to deploy
the mds, and the «new» way to do it is to use ceph fs volume.
But with ceph fs volume I didn't find any documentation of how to set the
metadata/data pool
I also didn't find any way to change after the creation of the volume the
pool.
Thanks
--
Albert SHIH 🦫 🐸
France
Heure locale/Local time:
mer. 24 janv. 2024 19:24:23 CET
Hi,
I'm facing a rather new issue with our Ceph cluster: from time to time
ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over
100 GB RAM:
[Nov21 15:02] tp_osd_tp invoked oom-killer:
gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[ +0.000010] oom_kill_process.cold+0xb/0x10
[ +0.000002] [ pid ] uid tgid total_vm rss pgtables_bytes
swapents oom_score_adj name
[ +0.000008]
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
[ +0.000697] Out of memory: Killed process 3941610 (ceph-mgr)
total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, shmem-rss:0kB,
UID:167 pgtables:260356kB oom_score_adj:0
[ +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
The cluster is stable and operating normally, there's nothing unusual going
on before, during or after the kill, thus it's unclear what causes the mgr
to balloon, use all RAM and get killed. Systemd logs aren't very helpful:
they just show normal mgr operations until it fails to allocate memory and
gets killed: https://pastebin.com/MLyw9iVi
The mgr experienced this issue several times in the last 2 months, and the
events don't appear to correlate with any other events in the cluster
because basically nothing else happened at around those times. How can I
investigate this and figure out what's causing the mgr to consume all
memory and get killed?
I would very much appreciate any advice!
Best regards,
Zakhar
Hello
When I run cephfs-top it causes mgr module crash. Can you please tell me
the reason?
My environment:
My ceph version 17.2.6
Operating System: Ubuntu 22.04.2 LTS
Kernel: Linux 5.15.0-84-generic
I created the cephfs-top user with the following command:
ceph auth get-or-create client.fstop mon 'allow r' mds 'allow r' osd 'allow
r' mgr 'allow r' > /etc/ceph/ceph.client.fstop.keyring
This is the crash report:
root@ud-01:~# ceph crash info
2024-01-22T21:25:59.313305Z_526253e3-e8cc-4d2c-adcb-69a7c9986801
{
"backtrace": [
" File \"/usr/share/ceph/mgr/stats/module.py\", line 32, in
notify\n self.fs_perf_stats.notify_cmd(notify_id)",
" File \"/usr/share/ceph/mgr/stats/fs/perf_stats.py\", line 177,
in notify_cmd\n metric_features =
int(metadata[CLIENT_METADATA_KEY][\"metric_spec\"][\"metric_flags\"][\"feature_bits\"],
16)",
"ValueError: invalid literal for int() with base 16: '0x'"
],
"ceph_version": "17.2.6",
"crash_id":
"2024-01-22T21:25:59.313305Z_526253e3-e8cc-4d2c-adcb-69a7c9986801",
"entity_name": "mgr.ud-01.qycnol",
"mgr_module": "stats",
"mgr_module_caller": "ActivePyModule::notify",
"mgr_python_exception": "ValueError",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig":
"971ae170f1fff7f7bc0b7ae86d164b2b0136a8bd5ca7956166ea5161e51ad42c",
"timestamp": "2024-01-22T21:25:59.313305Z",
"utsname_hostname": "ud-01",
"utsname_machine": "x86_64",
"utsname_release": "5.15.0-84-generic",
"utsname_sysname": "Linux",
"utsname_version": "#93-Ubuntu SMP Tue Sep 5 17:16:10 UTC 2023"
}
Best regards.
Good afternoon everybody!
I have a question regarding the documentation... I was reviewing it and
realized that the "vms" pool is not being used anywhere in the configs.
The first mention of this pool was in commit 2eab1c1 and, in e9b13fa, the
configuration section of nova.conf was removed, but the pool configuration
remained there.
Would it be correct to ignore all mentions of this pool (I don't see any
use for it)? If so, it would be interesting to update the documentation.
https://docs.ceph.com/en/latest/rbd/rbd-openstack/#create-a-pool
I'm having a bit of a weird issue with cluster rebalances with a new EC
pool. I have a 3-machine cluster, each machine with 4 HDD OSDs (+1 SSD).
Until now I've been using an erasure coded k=5 m=3 pool for most of my
data. I've recently started to migrate to a k=5 m=4 pool, so I can
configure the CRUSH rule to guarantee that data remains available if a
whole host goes down (3 chunks per host, 9 total). I also moved the 5,3
pool to this setup, although by nature I know its PGs will become
inactive if a host goes down (need at least k+1 OSDs to be up).
I've only just started migrating data to the 5,4 pool, but I've noticed
that any time I trigger any kind of backfilling (e.g. take one OSD out),
a bunch of PGs in the 5,4 pool become degraded (instead of just
misplaced/backfilling). This always seems to happen on that pool only,
and the object count is a significant fraction of the total pool object
count (it's not just "a few recently written objects while PGs were
repeering" or anything like that, I know about that effect).
Here are the pools:
pool 13 'cephfs2_data_hec5.3' erasure profile ec5.3 size 8 min_size 6
crush_rule 7 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode
warn last_change 14133 lfor 0/11307/11305 flags
hashpspool,ec_overwrites,bulk stripe_width 20480 application cephfs
pool 14 'cephfs2_data_hec5.4' erasure profile ec5.4 size 9 min_size 6
crush_rule 7 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode
warn last_change 14509 lfor 0/0/14234 flags
hashpspool,ec_overwrites,bulk stripe_width 20480 application cephfs
EC profiles:
# ceph osd erasure-code-profile get ec5.3
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=5
m=3
plugin=jerasure
technique=reed_sol_van
w=8
# ceph osd erasure-code-profile get ec5.4
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=5
m=4
plugin=jerasure
technique=reed_sol_van
w=8
They both use the same CRUSH rule, which is designed to select 9 OSDs
balanced across the hosts (of which only 8 slots get used for the older
5,3 pool):
rule hdd-ec-x3 {
id 7
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step choose indep 3 type host
step choose indep 3 type osd
step emit
}
If I take out an OSD (14), I get something like this:
health: HEALTH_WARN
Degraded data redundancy: 37631/120155160 objects degraded
(0.031%), 38 pgs degraded
All the degraded PGs are in the 5,4 pool, and the total object count is
around 50k, so this is *most* of the data in the pool becoming degraded
just because I marked an OSD out (without stopping it). If I mark the
OSD in again, the degraded state goes away.
Example degraded PGs:
# ceph pg dump | grep degraded
dumped all
14.3c 812 0 838 0 0
11925027758 0 0 1088 0 1088
active+recovery_wait+undersized+degraded+remapped
2024-01-19T18:06:41.786745+0900 15440'1088 15486:10772
[18,17,16,1,3,2,11,13,12] 18 [18,17,16,1,3,2,11,NONE,12]
18 14537'432 2024-01-12T11:25:54.168048+0900
0'0 2024-01-08T15:18:21.654679+0900 0 2
periodic scrub scheduled @ 2024-01-21T08:00:23.572904+0900
241 0
14.3d 772 0 1602 0 0
11303280223 0 0 1283 0 1283
active+recovery_wait+undersized+degraded+remapped
2024-01-19T18:06:41.919971+0900 15470'1283 15486:13384
[18,17,16,3,1,0,13,11,12] 18 [18,17,16,3,1,0,NONE,NONE,12]
18 14990'771 2024-01-15T12:15:59.397469+0900
0'0 2024-01-08T15:18:21.654679+0900 0 3
periodic scrub scheduled @ 2024-01-23T15:56:58.912801+0900
534 0
14.3e 806 0 832 0 0
11843019697 0 0 1035 0 1035
active+recovery_wait+undersized+degraded+remapped
2024-01-19T18:06:42.297251+0900 15465'1035 15486:15423
[18,16,17,12,13,11,1,3,0] 18 [18,16,17,12,13,NONE,1,3,0]
18 14623'500 2024-01-13T08:54:55.709717+0900
0'0 2024-01-08T15:18:21.654679+0900 0 1
periodic scrub scheduled @ 2024-01-22T09:54:51.278368+0900
331 0
14.3f 782 0 813 0 0
11598393034 0 0 1083 0 1083
active+recovery_wait+undersized+degraded+remapped
2024-01-19T18:06:41.845173+0900 15465'1083 15486:18496
[17,18,16,3,0,1,11,12,13] 17 [17,18,16,3,0,1,11,NONE,13]
17 14990'800 2024-01-15T16:42:08.037844+0900
14990'800 2024-01-15T16:42:08.037844+0900 0
40 periodic scrub scheduled @ 2024-01-23T10:44:06.083985+0900
563 0
The first PG when I put the OSD back in:
14.3c 812 0 0 0 0
11925027758 0 0 1088 0 1088
active+clean 2024-01-19T18:07:18.079295+0900 15440'1088
15489:10792 [18,17,16,1,3,2,11,14,12] 18
[18,17,16,1,3,2,11,14,12] 18 14537'432
2024-01-12T11:25:54.168048+0900 0'0
2024-01-08T15:18:21.654679+0900 0 2
periodic scrub scheduled @ 2024-01-21T09:41:43.026836+0900
241 0
As far as I know PGs are not supposed to actually become *degraded* when
merely moving data around without any OSDs going down. Am I doing
something wrong here? Any idea why this is affecting one pool and not
both, even though they are almost identical in setup? It's as if, for
this one pool, marking an OSD out has the effect of making its data
unavailable entirely, instead of merely backfill to other OSDs (the OSD
shows up as NONE in the above dump).
OSD tree:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 89.13765 root default
-13 29.76414 host flamingo
11 hdd 7.27739 osd.11 up 1.00000 1.00000
12 hdd 7.27739 osd.12 up 1.00000 1.00000
13 hdd 7.27739 osd.13 up 1.00000 1.00000
14 hdd 7.20000 osd.14 up 1.00000 1.00000
8 ssd 0.73198 osd.8 up 1.00000 1.00000
-10 29.84154 host heart
0 hdd 7.27739 osd.0 up 1.00000 1.00000
1 hdd 7.27739 osd.1 up 1.00000 1.00000
2 hdd 7.27739 osd.2 up 1.00000 1.00000
3 hdd 7.27739 osd.3 up 1.00000 1.00000
9 ssd 0.73198 osd.9 up 1.00000 1.00000
-3 0 host hub
-7 29.53197 host soleil
15 hdd 7.20000 osd.15 up 0 1.00000
16 hdd 7.20000 osd.16 up 1.00000 1.00000
17 hdd 7.20000 osd.17 up 1.00000 1.00000
18 hdd 7.20000 osd.18 up 1.00000 1.00000
10 ssd 0.73198 osd.10 up 1.00000 1.00000
(I'm in the middle of doing some reprovisioning so 15 is out, this
happens any time I take any OSD out)
# ceph --version
ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)
- Hector
Hi,
this question has come up once in the past[0] afaict, but it was kind of inconclusive so I'm taking the liberty of bringing it up again.
I'm looking into implementing a key rotation scheme for Ceph client keys. As it potentially takes some non-zero amount of time to update key material there might be a situation where keys have changed on the MON side but, still one of N clients might not have updated key material and try to auth with an obsolete key which naturally would fail.
It would be great if we could have two keys active for an entity at the same time, but aiui that's not really possible, is that right?
I'm wondering about ceph auth get-or-create-pending. Per the docs a pending key would become active on first use, so that if one of N clients uses it, this still leaves room for another client to race.
What do people do to deal with this situation?
[0] https://ceph-users.ceph.narkive.com/ObSMdmxX/rotating-cephx-keys
Hi,
According to the documentation¹ the special host label _admin instructs
the cephadm orchestrator to place a valid ceph.conf and the
ceph.client.admin.keyring into /etc/ceph of the host.
I noticed that (at least) on 17.2.7 only the keyring file is placed in
/etc/ceph, but not ceph.conf.
Both files are placed into the /var/lib/ceph/<fsid>/config directory.
Has something changed?
¹:
https://docs.ceph.com/en/quincy/cephadm/host-management/#special-host-labels
Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin
https://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin