I have a ceph cluster installed using cephadm.
The cluster is up and running but I'm unable to get Keystone integration working with RADOSGW.
Is this a known issue?
thanks,Fred.
Hi Team,
I'm writing to bring to your attention an issue we have encountered with the "mtime" (modification time) behavior for directories in the Ceph filesystem.
Upon observation, we have noticed that when the mtime of a directory (let's say: dir1) is explicitly changed in CephFS, subsequent additions of files or directories within
'dir1' fail to update the directory's mtime as expected.
This behavior appears to be specific to CephFS - we have reproduced this issue on both Quincy and Pacific. Similar steps work as expected in the ext4 filesystem amongst others.
Reproduction steps:
1. Create a directory - mkdir dir1
2. Modify mtime using the touch command - touch dir1
3. Create a file or directory inside of 'dir1' - mkdir dir1/dir2
Expected result:
mtime for dir1 should change to the time the file or directory was created in step 3
Actual result:
there was no change to the mtime for 'dir1'
Note : For more detail, kindly find the attached logs.
Our queries are :
1. Is this expected behavior for CephFS?
2. If so, can you explain why the directory behavior is inconsistent depending on whether the mtime for the directory has previously been manually updated.
Best Regards,
Sandip Divekar
Component QA Lead SDET.
Hi,
I'm not able to find the information about used size of a storage class.
- bucket stats
- usage show
- user stats ...
Does Radosgw support it? Thanks
Hi,
In Ceph Radosgw 15.2.17, I get this issue when trying to create a push endpoint to Kafka
Here is push endpoint configuration:
endpoint_args = 'push-endpoint=kafka://abcef:123456@kafka.endpoint:9093&use-ssl=true&ca-location=/etc/ssl/certs/ca.crt'
attributes = {nvp[0] : nvp[1] for nvp in urllib.parse.parse_qsl(endpoint_args, keep_blank_values=True)}
response = snsclient.create_topic(Name=topic_name, Attributes=attributes)
When I put an object, the radosgw log show this:
Kafka connect: failed to create producer: ssl.ca.location failed: crypto/x509/by_file.c:199: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib:
I have checked my ca.crt file and it is definitely in x509 format. If I use RGW v16.2.13, the producer will be created successfully.
Anyone have any ideal? Thanks
Dear Ceph users,
after an outage and recovery of one machine I have several PGs stuck in
active+recovering+undersized+degraded+remapped. Furthermore, many PGs
have not been (deep-)scrubbed in time. See below for status and health
details.
It's been like this for two days, with no recovery I/O being reported,
so I guess something is stuck in a bad state. I'd need some help in
understanding what's going on here and how to fix it.
Thanks,
Nicola
---------------------
# ceph -s
cluster:
id: b1029256-7bb3-11ec-a8ce-ac1f6b627b45
health: HEALTH_WARN
2 OSD(s) have spurious read errors
Degraded data redundancy: 7349/147534197 objects degraded
(0.005%), 22 pgs degraded, 22 pgs undersized
332 pgs not deep-scrubbed in time
503 pgs not scrubbed in time
(muted: OSD_SLOW_PING_TIME_BACK OSD_SLOW_PING_TIME_FRONT)
services:
mon: 5 daemons, quorum bofur,balin,aka,romolo,dwalin (age 2d)
mgr: bofur.tklnrn(active, since 32h), standbys: balin.hvunfe,
aka.wzystq
mds: 2/2 daemons up, 1 standby
osd: 104 osds: 104 up (since 37h), 104 in (since 37h); 22 remapped pgs
data:
volumes: 1/1 healthy
pools: 3 pools, 529 pgs
objects: 18.53M objects, 40 TiB
usage: 54 TiB used, 142 TiB / 196 TiB avail
pgs: 7349/147534197 objects degraded (0.005%)
2715/147534197 objects misplaced (0.002%)
507 active+clean
20 active+recovering+undersized+degraded+remapped
2 active+recovery_wait+undersized+degraded+remapped
# ceph health detail
[WRN] PG_DEGRADED: Degraded data redundancy: 7349/147534197 objects
degraded (0.005%), 22 pgs degraded, 22 pgs undersized
pg 3.2c is stuck undersized for 37h, current state
active+recovery_wait+undersized+degraded+remapped, last acting
[79,83,34,37,65,NONE,18,95]
pg 3.57 is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[57,99,37,NONE,15,104,55,40]
pg 3.76 is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[57,5,37,15,100,33,85,NONE]
pg 3.9c is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[57,86,88,NONE,11,69,20,10]
pg 3.106 is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[79,15,89,NONE,36,32,23,64]
pg 3.107 is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[79,NONE,64,20,61,92,104,43]
pg 3.10c is stuck undersized for 37h, current state
active+recovery_wait+undersized+degraded+remapped, last acting
[79,34,NONE,95,104,16,69,18]
pg 3.11e is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[79,89,64,46,32,NONE,40,15]
pg 3.14e is stuck undersized for 37h, current state
active+recovering+undersized+degraded+remapped, last acting
[57,34,69,97,85,NONE,46,62]
pg 3.160 is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[57,1,101,84,18,33,NONE,69]
pg 3.16a is stuck undersized for 37h, current state
active+recovering+undersized+degraded+remapped, last acting
[57,16,59,103,13,38,49,NONE]
pg 3.16e is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[57,0,27,96,55,10,81,NONE]
pg 3.170 is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[NONE,57,14,46,55,99,15,40]
pg 3.19b is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[NONE,79,59,8,32,17,7,90]
pg 3.1a0 is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[NONE,79,26,50,104,24,97,40]
pg 3.1a5 is stuck undersized for 37h, current state
active+recovering+undersized+degraded+remapped, last acting
[57,100,61,27,20,NONE,24,85]
pg 3.1a8 is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[79,24,NONE,3,55,40,98,45]
pg 3.1aa is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[79,91,48,NONE,24,3,8,85]
pg 3.1af is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[79,NONE,90,33,104,69,26,8]
pg 3.1c1 is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[79,95,NONE,53,54,27,18,85]
pg 3.1c4 is stuck undersized for 2d, current state
active+recovering+undersized+degraded+remapped, last acting
[79,69,56,84,95,8,NONE,4]
pg 3.1d5 is stuck undersized for 37h, current state
active+recovering+undersized+degraded+remapped, last acting
[57,48,NONE,104,34,16,37,89]
[WRN] PG_NOT_DEEP_SCRUBBED: 332 pgs not deep-scrubbed in time
pg 3.1ff not deep-scrubbed since 2023-05-18T21:06:57.883787+0000
pg 3.1fe not deep-scrubbed since 2023-05-22T19:50:11.497538+0000
pg 3.1fd not deep-scrubbed since 2023-05-22T19:44:12.680598+0000
pg 3.1fc not deep-scrubbed since 2023-05-20T19:56:43.746580+0000
pg 3.1fb not deep-scrubbed since 2023-05-22T18:29:12.794152+0000
pg 3.1f9 not deep-scrubbed since 2023-05-19T08:19:16.636964+0000
pg 3.1f8 not deep-scrubbed since 2023-05-22T21:49:28.891350+0000
pg 3.1f5 not deep-scrubbed since 2023-05-18T21:18:19.636068+0000
pg 3.1f4 not deep-scrubbed since 2023-05-18T18:00:41.241562+0000
pg 3.1f3 not deep-scrubbed since 2023-05-21T01:36:32.735139+0000
pg 3.1f2 not deep-scrubbed since 2023-05-23T03:59:02.154966+0000
pg 3.1f1 not deep-scrubbed since 2023-05-22T21:47:46.419880+0000
pg 3.1f0 not deep-scrubbed since 2023-05-22T19:17:38.327356+0000
pg 3.1ef not deep-scrubbed since 2023-05-19T01:49:04.133392+0000
pg 3.1ee not deep-scrubbed since 2023-05-21T12:25:52.010406+0000
pg 3.1ed not deep-scrubbed since 2023-05-19T20:13:20.675257+0000
pg 3.1eb not deep-scrubbed since 2023-05-18T12:13:53.684650+0000
pg 3.1ea not deep-scrubbed since 2023-05-18T09:45:57.172578+0000
pg 3.1e9 not deep-scrubbed since 2023-05-23T00:26:18.621324+0000
pg 3.1e8 not deep-scrubbed since 2023-05-21T05:15:03.969687+0000
pg 3.1e4 not deep-scrubbed since 2023-05-21T16:21:11.738145+0000
pg 3.1e3 not deep-scrubbed since 2023-05-22T13:13:19.611165+0000
pg 3.1e0 not deep-scrubbed since 2023-05-21T17:43:36.545240+0000
pg 3.1de not deep-scrubbed since 2023-05-18T00:03:49.873073+0000
pg 3.1dd not deep-scrubbed since 2023-05-22T20:30:56.025015+0000
pg 3.1db not deep-scrubbed since 2023-05-22T18:12:44.615539+0000
pg 3.1da not deep-scrubbed since 2023-05-20T21:11:00.060022+0000
pg 3.1d9 not deep-scrubbed since 2023-05-22T19:02:03.292022+0000
pg 3.1d8 not deep-scrubbed since 2023-05-23T17:37:05.320161+0000
pg 3.1d6 not deep-scrubbed since 2023-05-19T15:19:58.293551+0000
pg 3.1d4 not deep-scrubbed since 2023-05-23T02:28:54.392188+0000
pg 3.1d3 not deep-scrubbed since 2023-05-18T06:02:14.181321+0000
pg 3.1d2 not deep-scrubbed since 2023-05-18T11:46:29.582700+0000
pg 3.1d1 not deep-scrubbed since 2023-05-19T08:31:54.033426+0000
pg 3.1cd not deep-scrubbed since 2023-05-21T08:52:41.817826+0000
pg 3.1cc not deep-scrubbed since 2023-05-22T22:51:02.466708+0000
pg 3.1c9 not deep-scrubbed since 2023-05-18T08:06:50.220587+0000
pg 3.1c7 not deep-scrubbed since 2023-05-22T17:07:35.346608+0000
pg 3.1c5 not deep-scrubbed since 2023-05-20T17:09:12.048012+0000
pg 3.1c1 not deep-scrubbed since 2023-05-21T11:39:47.640196+0000
pg 3.1c0 not deep-scrubbed since 2023-05-22T20:22:57.166475+0000
pg 3.1bf not deep-scrubbed since 2023-05-19T19:08:08.313143+0000
pg 3.1be not deep-scrubbed since 2023-05-21T12:28:17.345386+0000
pg 3.1bd not deep-scrubbed since 2023-05-18T19:19:29.002801+0000
pg 3.1bb not deep-scrubbed since 2023-05-19T07:15:53.508751+0000
pg 3.1b8 not deep-scrubbed since 2023-05-19T18:50:27.701909+0000
pg 3.1b6 not deep-scrubbed since 2023-05-19T03:30:55.707248+0000
pg 3.1b5 not deep-scrubbed since 2023-05-20T20:37:48.346272+0000
pg 3.1b4 not deep-scrubbed since 2023-05-23T02:11:04.833784+0000
pg 3.1b3 not deep-scrubbed since 2023-05-18T20:46:40.876590+0000
282 more pgs...
[WRN] PG_NOT_SCRUBBED: 503 pgs not scrubbed in time
pg 3.1ff not scrubbed since 2023-05-24T23:37:22.323516+0000
pg 3.1fe not scrubbed since 2023-05-25T02:01:18.754476+0000
pg 3.1fd not scrubbed since 2023-05-24T20:31:23.239794+0000
pg 3.1fc not scrubbed since 2023-05-25T00:42:05.670791+0000
pg 3.1fb not scrubbed since 2023-05-24T19:29:29.438626+0000
pg 3.1fa not scrubbed since 2023-05-24T21:50:04.911965+0000
pg 3.1f9 not scrubbed since 2023-05-25T20:44:49.010622+0000
pg 3.1f8 not scrubbed since 2023-05-24T18:17:49.471926+0000
pg 3.1f7 not scrubbed since 2023-05-24T17:27:43.545337+0000
pg 3.1f6 not scrubbed since 2023-05-24T22:16:04.008644+0000
pg 3.1f5 not scrubbed since 2023-05-24T20:14:01.159271+0000
pg 3.1f4 not scrubbed since 2023-05-24T16:20:29.746958+0000
pg 3.1f3 not scrubbed since 2023-05-25T00:45:49.464448+0000
pg 3.1f2 not scrubbed since 2023-05-24T17:37:58.701570+0000
pg 3.1f1 not scrubbed since 2023-05-24T20:21:46.824657+0000
pg 3.1f0 not scrubbed since 2023-05-25T00:59:02.693836+0000
pg 3.1ef not scrubbed since 2023-05-24T21:35:10.061965+0000
pg 3.1ee not scrubbed since 2023-05-24T17:13:37.835095+0000
pg 3.1ed not scrubbed since 2023-05-24T18:17:21.739348+0000
pg 3.1ec not scrubbed since 2023-05-24T17:54:23.365899+0000
pg 3.1eb not scrubbed since 2023-05-24T23:18:31.345229+0000
pg 3.1ea not scrubbed since 2023-05-25T00:25:06.747723+0000
pg 3.1e9 not scrubbed since 2023-05-25T19:27:39.496774+0000
pg 3.1e8 not scrubbed since 2023-05-25T01:31:11.083814+0000
pg 3.1e7 not scrubbed since 2023-05-25T01:43:43.116599+0000
pg 3.1e6 not scrubbed since 2023-05-24T18:26:39.778008+0000
pg 3.1e4 not scrubbed since 2023-05-24T22:18:59.986309+0000
pg 3.1e3 not scrubbed since 2023-05-24T14:34:52.095564+0000
pg 3.1e2 not scrubbed since 2023-05-24T23:56:04.083842+0000
pg 3.1e1 not scrubbed since 2023-05-25T02:00:18.766811+0000
pg 3.1e0 not scrubbed since 2023-05-25T02:01:42.094304+0000
pg 3.1df not scrubbed since 2023-05-24T19:41:59.890557+0000
pg 3.1de not scrubbed since 2023-05-24T23:57:49.463552+0000
pg 3.1dd not scrubbed since 2023-05-25T17:42:33.397660+0000
pg 3.1dc not scrubbed since 2023-05-24T17:34:43.656366+0000
pg 3.1db not scrubbed since 2023-05-24T21:48:10.126232+0000
pg 3.1da not scrubbed since 2023-05-24T17:54:43.136739+0000
pg 3.1d9 not scrubbed since 2023-05-24T20:22:14.256914+0000
pg 3.1d8 not scrubbed since 2023-05-24T23:34:56.555311+0000
pg 3.1d7 not scrubbed since 2023-05-25T18:08:08.689329+0000
pg 3.1d6 not scrubbed since 2023-05-24T20:23:30.301130+0000
pg 3.1d5 not scrubbed since 2023-05-25T20:30:25.691077+0000
pg 3.1d4 not scrubbed since 2023-05-24T21:21:46.923743+0000
pg 3.1d3 not scrubbed since 2023-05-24T18:12:50.468466+0000
pg 3.1d2 not scrubbed since 2023-05-24T20:33:32.376232+0000
pg 3.1d1 not scrubbed since 2023-05-24T20:32:55.981738+0000
pg 3.1d0 not scrubbed since 2023-05-24T18:16:51.195524+0000
pg 3.1cf not scrubbed since 2023-05-24T22:32:00.879058+0000
pg 3.1ce not scrubbed since 2023-05-25T02:46:02.834267+0000
pg 3.1cd not scrubbed since 2023-05-24T21:02:08.288116+0000
453 more pgs...
Dear all,
after the update to CEPH 16.2.13 the Prometheus exporter is wrongly
exporting multiple metric help & type lines for ceph_pg_objects_repaired:
[mon1] /root #curl -sS http://localhost:9283/metrics
# HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count
# TYPE ceph_pg_objects_repaired counter
ceph_pg_objects_repaired{poolid="34"} 0.0
# HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count
# TYPE ceph_pg_objects_repaired counter
ceph_pg_objects_repaired{poolid="33"} 0.0
# HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count
# TYPE ceph_pg_objects_repaired counter
ceph_pg_objects_repaired{poolid="32"} 0.0
[...]
This annoys our exporter_exporter service so it rejects the export of ceph
metrics. Is this a known issue? Will this be fixed in the next update?
Cheers,
Andreas
--
| Andreas Haupt | E-Mail: andreas.haupt(a)desy.de
| DESY Zeuthen | WWW: http://www.zeuthen.desy.de/~ahaupt
| Platanenallee 6 | Phone: +49/33762/7-7359
| D-15738 Zeuthen | Fax: +49/33762/7-7216
I am trying to debug an issue with ceph orch host add
Is there a way to debug the specific ssh commands being issued or add
debugging code to a python script?
There is nothing useful in my syslog or /var/log/ceph/cephadm.log
Is there a way to get the command to log, or can someone point me in the
direction of the source code so I can have a look?
I've run tcpdump on port 22 to listen for outgoing packets and also for
traffic going to the target IP, and there is nothing going out when I
run ceph orch host add If I run ssh inside the cephadm shell then I see
the packets go out and it works as I document below.
I was going to upgrade to Quincy from Pacific 16.2.5 and decided to
upgrade from ceph-deploy to cephadm
I initially had problems because I run ssh on a non-standard port.
Allowing port 22 has allowed me to run the command below on every node
except one.
ceph orch host add [short hostname] [ip address]
That one host fails, inexplicably with the error:
Error EINVAL: Failed to connect to cephstorage-rs01 (103.XXX.YY.ZZ).
If I run cephadm shell (without --no-hosts as that gives the error:
unknown flag: --no-hosts) it works as expected.
# cephadm shell
Inferring fsid 525ec8aa-b401-4ddf-aa8f-4493727dac02
Inferring config
/var/lib/ceph/525ec8aa-b401-4ddf-aa8f-4493727dac02/mon.cephstorage-ig03/config
Using recent ceph image
ceph/daemon-base@sha256:a038c6dc35064edff40bb7e824783f1bbd325c888e722ec5e814671406216ad5
root@cephstorage-ig03:/# ceph cephadm get-ssh-config > ssh_config
root@cephstorage-ig03:/# ceph config-key get
mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
root@cephstorage-ig03:/# chmod 0600 ~/cephadm_private_key
root@cephstorage-ig03:/# ssh -F ssh_config -i ~/cephadm_private_key
root(a)103.XXX.YY.ZZ
Warning: Permanently added '103.XXX.YY.ZZ' (ECDSA) to the list of known
hosts.
Welcome to XXXXX
I was mucking around with custom ssh-config files to get around the port
issue, but it did not seem to work so I and reverted back to the vanilla
version with: ceph cephadm clear-ssh-config
So when I am inside the shell it works, but it doesn't work properly via
ceph orch host add
There is one thing that is unusual that I think is worth mentioning.
When I was adding the servers with custom ssh config files, I had a bad
entry in the hosts file for cephstorage-rs01 on that server, resolving
to 127.0.0.1 When I added it, it said it added the IP as 127.0.0.127#
ceph orch host ls
HOST ADDR LABELS STATUS
...
cephstorage-rs01 127.0.0.127 Offline
...
I then ran
ceph orch host rm cephstorage-rs01
I have tried an iptables re-route in the vain idea that if there was
some kind of host to IP cache it would route to localhost and tell me
that the host name didn't match. That did not work.
Right now, I am a sad panda as my ceph cluster is half transitioned. My
next port of call is probably to try and adopt what I can into cephadm
and make sure the cluster is ok, and then finally drop the problem node
and then re-add it.
Any help will be appreciated.
Regards,
David
Hi Marc,
I uploaded all scripts and a rudimentary readme to https://github.com/frans42/cephfs-bench . I hope it is sufficient to get started. I'm afraid its very much tailored to our deployment and I can't make it fully configurable anytime soon. I hope it serves a purpose though - at least I discovered a few bugs with it.
We actually kept the benchmark running through an upgrade from mimic to octopus. Was quite interesting to see how certain performance properties change with that. This benchmark makes it possible to compare versions with live timings coming in.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Marc <Marc(a)f1-outsourcing.eu>
Sent: Monday, May 15, 2023 11:28 PM
To: Frank Schilder
Subject: RE: [ceph-users] Re: CEPH Version choice
> I planned to put it on-line. The hold-back is that the main test is un-
> taring a nasty archive and this archive might contain personal
> information, so I can't just upload it as is. I can try to put together
> a similar archive from public sources. Please give me a bit of time. I'm
> also a bit under stress right now with our users being hit by an FS meta
> data corruption. That's also why I'm a bit trigger happy.
>
Ok thanks, very nice, no hurry!!!