May 2021 - ceph-users - lists.ceph.io

dashboard connecting to the object gateway

by Fabrice Bacchella

I'm still trying to understand how the manager and dashboard connect to different object gateway, and I don’t really understand how it works. Initially, I wanted to have each gateway listen only to localhost, on http: [client.radosgw.<%= $id%>] rgw_frontends = beast endpoint=127.0.0.1:9080 It fails, but in a strange way. It was indeed connecting to the port 9080, so the mgr reads that line, but using the host IP. I can easily understand why binding only to 127.0.0.1 was a bad idea, but I don't understand how it choose the IP to connect to. When looking at the configuration using the dasbhoard, I see a field called hostname. This value seems to be read-only, as set it it in ceph.conf to a obvious wrong value (www.google.com <http://www.google.com/>) does not change it. Is it that field that is used ? It's important because if I want to have SSL verification, I must know the value: IP, hostname, FQDN ? that I will put in the Subject Alternative Name of the certificate.

2 years, 11 months

1
0
0 0

14.2.20: Strange monitor problem eating 100% CPU

by Rainer Krienke

Hello, I am playing around with a test ceph 14.2.20 cluster. The cluster consists of 4 VMs, each VM has 2 OSDs. The first three VMs vceph1, vceph2 and vceph3 are monitors. vceph1 is also mgr. What I did was quite simple. The cluster is in the state HEALTHY: vceph2: systemctl stop ceph-osd@2 # let ceph repair until ceph -s reports cluster is healthy again vceph2: systemctl start ceph-osd@2 # @ 15:39:15, for the logs # cluster reports in cephs -s that 8 OSDs are up and in, then # starts rebalance osd.2 vceph2: ceph -s # hangs forever also if executed on vceph3 or 4 # mon on vceph1 eats 100% CPU permanently, the other mons ~0 %CPU vceph1: systemctl stop ceph-mon@vceph1 # wait ~30 sec to terminate vceph1: systemctl start ceph-mon@vceph1 # Everything is OK again I posted the mon-log to: https://cloud.uni-koblenz.de/s/t8tWjWFAobZb5Hy Strange enough if I set "debug mon 20" before starting the experiment this bug does not show up. I also tried the very same procedure on the same cluster updated to 15.2.11 but I was unable to reproduce this bug in this ceph version. Thanks Rainer -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312

2 years, 11 months

3
6
0 0

Certificat format for the SSL dashboard

by Fabrice Bacchella

Once activated the dashboard, I try to import certificates, but it fails: $ ceph dashboard set-ssl-certificate-key -i /data/ceph/conf/ceph.key Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1337, in _handle_command return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 389, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/dashboard/module.py", line 385, in set_ssl_certificate_key self.set_store('key', inbuf.decode()) AttributeError: 'str' object has no attribute 'decode' $ ceph dashboard set-ssl-certificate -i /data/ceph/conf/ceph.crt Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1337, in _handle_command return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 389, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/dashboard/module.py", line 372, in set_ssl_certificate self.set_store('crt', inbuf.decode()) AttributeError: 'str' object has no attribute 'decode' They are both PEM encoded files: file /data/ceph/conf/ceph.key /data/ceph/conf/ceph.crt /data/ceph/conf/ceph.key: PEM RSA private key /data/ceph/conf/ceph.crt: PEM certificate What format does this command expect ? That error happens on Centos 8.3.2011 with ceph-mgr-16.2.1-0.el8.x86_64, downloaded directly from ceph.

2 years, 11 months

2
3
0 0

Where is the MDS journal written to?

by mabi

Hello, I have a small Octopus cluster (3 mon/mgr nodes, 3 osd nodes) installed with cephadm and hence running inside podman containers on Ubuntu 20.04. I want to use CephFS so I created a fs volume and saw that two MDS containers have been automatically deployed on two of my OSD nodes. Now I saw in the documentation (https://docs.ceph.com/en/latest/cephfs/) that the MDS first writes to a journal before writing into the meta pool so I was wondering where exactly is this MDS journal located? Because my MDS service is running inside a container, does this mean the journal gets written inside the container itself? I am asking because my containers are running from the OS hard disk which is not as performant as the OSD hard disk. So if this is the case it would be better for performance to have a dedicated hard disk for MDS? is this correct? And if yes how do I specify to cephadm that my MDS container should be writing its journal at another location? Finally I read (https://docs.ceph.com/en/latest/start/hardware-recommendations/) that the Ceph MDS service only needs 1 MB storage for its journal, is that all? Best regards, Mabi

2 years, 11 months

2
2
0 0

Failed cephadm Upgrade - ValueError

by Ashley Merrick

Hello All,I was running 15.2.8 via cephadm on docker Ubuntu 20.04I just attempted to upgrade to 16.2.1 via the automated method, it successfully upgraded the mon/mgr/mds and some OSD's, however it then failed on an OSD and hasn't been able to pass even after stopping and restarting the upgrade.It reported the following ""message": "Error: UPGRADEREDEPLOYDAEMON: Upgrading daemon osd.35 on host sn-s01 failed.""If I run 'ceph health detail' I get lot's of the following error : "ValueError: not enough values to unpack (expected 2, got 1)" throughout the detail reportUpon googling, it looks like I am hitting something along the lines of https://158.69.68.89/issues/48924 & https://tracker.ceph.com/issues/49522What do I need to do to either get around this bug, or a way I can manually upgrade the remaining ceph OSD's to 16.2.1, currently my cluster is working but the last OSD it failed to upgrade is currently offline (I guess as no image attached to it now as it failed to pull it), and I have a cluster with OSD's from not 15.2.8 and 16.2.1Thanks Sent via MXlogin

2 years, 11 months

2
4
0 0

possible bug in radosgw-admin bucket radoslist

by Rob Haverkamp

Hi there, I think I found a bug in the radosgw-admin bucket radoslist command. I'm not 100% sure so would like to check here first before I fill a bug report. I have a bucket called bucket3. If I do a multipart upload and stop it halfway for example and start a new upload with the same name and abort it, I'll get duplicate lines in the radoslist output. For example: root@alpha:~# radosgw-admin bucket radoslist --bucket bucket3 root@alpha:~# root@alpha:~# s3cmd put -P 100MB.bin s3://bucket3/multipart-obj-fail upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 1 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 35.14 MB/s done upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 2 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 33.59 MB/s done upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 3 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 33.88 MB/s done upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 4 of 7, 15MB] [1 of 1] 65536 of 15728640 0% in 0s 806.10 KB/s^CERROR: Upload of '100MB.bin' part 4 failed. Use /usr/bin/s3cmd abortmp s3://bucket3/multipart-obj-fail 2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX to abort the upload, or /usr/bin/s3cmd --upload-id 2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX put ... to continue the If I now run a radoslist it looks fine: 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_3 However, if I do a second upload with the same name: root@alpha:~# s3cmd put -P 100MB.bin s3://bucket3/multipart-obj-fail upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 1 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 23.18 MB/s done upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 2 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 1s 13.10 MB/s done upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 3 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 29.93 MB/s done upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 4 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 39.53 MB/s done upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 5 of 7, 15MB] [1 of 1] ^CERROR: Upload of '100MB.bin' part 5 failed. Use /usr/bin/s3cmd abortmp s3://bucket3/multipart-obj-fail 2~yS7Tzru_FSP6rkg4yeO28os207nDtTw to abort the upload, or /usr/bin/s3cmd --upload-id 2~yS7Tzru_FSP6rkg4yeO28os207nDtTw put ... to continue the upload. See ya! Note that the IDs are unique, run1: 2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX run2: 2~yS7Tzru_FSP6rkg4yeO28os207nDtTw But if we look at the radoslist output again: 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1_3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2_3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3_3 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4_1 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4_2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4_3 the duplicates are(output from radoslist | sort | uniq -c): 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4 If I again do a new upload and stop it I get a new upload ID 2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX but again with duplicate entries(output from radoslist | sort | uniq -c): 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~iNQha2hAznnKdSLukUAbJT1-4nXoWUy.1 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~iNQha2hAznnKdSLukUAbJT1-4nXoWUy.2 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~iNQha2hAznnKdSLukUAbJT1-4nXoWUy.3 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~iNQha2hAznnKdSLukUAbJT1-4nXoWUy.4 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3 2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4 I'm running Ceph Octopus deployed by cephadm: root@alpha:~# ceph versions { "overall": { "ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 12 } } Is this expected behaviour or should I make a bug report for this? Kind regards, Rob

2 years, 11 months

1
0
0 0

OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

by Frank Schilder

Hi all, this is a follow-up on "reboot breaks OSDs converted from ceph-disk to ceph-volume simple". I converted a number of ceph-disk OSDs to ceph-volume using "simple scan" and "simple activate". Somewhere along the way, the OSDs meta-data gets rigged and the prominent symptom is that the symlink block is changes from a part-uuid target to an unstable device name target like: before conversion: block -> /dev/disk/by-partuuid/9123be91-7620-495a-a9b7-cc85b1de24b7 after conversion: block -> /dev/sdj2 This is a huge problem as the "after conversion" device names are unstable. I have now a cluster that I cannot reboot servers on due to this problem. OSDs randomly re-assigned devices will refuse to start with: 2021-03-02 15:56:21.709 7fb7c2549b80 -1 OSD id 241 != my id 248 Please help me with getting out of this mess. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

2 years, 11 months

2
7
0 0

using ec pool with rgw

by Marco Savoca

Hi, I’m currently deploying a new cluster for cold storage with rgw. Is there actually a more elegant method to get the bucket data on an erasure coding pool other than moving the pool or creating the bucket.data pool prior to data upload? Thanks, Marco Savoca

2 years, 11 months

2
1
0 0

Spam from Chip Cox

by Frank Schilder

Does anyone else receive unsolicited replies from sender "Chip Cox <chip(a)softiron.com>" to e-mails posted on this list? Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

2 years, 11 months

1
0
0 0

[ Ceph MDS MON Config Variables ] Failover Delay issue

by Lokendra Rathour

Hi Team, I was setting up the ceph cluster with - Node Details:3 Mon,2 MDS, 2 Mgr, 2 RGW - Deployment Type: Active Standby - Testing Mode: Failover of MDS Node - Setup : Octopus (15.2.7) - OS: centos 8.3 - hardware: HP - Ram: 128 GB on each Node - OSD: 2 ( 1 tb each) - Operation: Normal I/O with mkdir on every 1 second. T*est Case: Power-off any active MDS Node for failover to happen* *Observation:* We have observed that whenever an active MDS Node is down it takes around* 40 seconds* to activate the standby MDS Node. on further checking the logs for the new-handover MDS Node we have seen delay on the basis of following inputs: 1. 10 second delay after which Mon calls for new Monitor election 1. [log] 0 log_channel(cluster) log [INF] : mon.cephnode1 calling monitor election 2. 5 second delay in which newly elected Monitor is elected 1. [log] 0 log_channel(cluster) log [INF] : mon.cephnode1 is new leader, mons cephnode1,cephnode3 in quorum (ranks 0,2) 3. the addition beacon grace time for which the system waits before which it enables standby MDS node activation. (approx delay of 19 seconds) 1. defaults : sudo ceph config get mon mds_beacon_grace 15.000000 2. sudo ceph config get mon mds_beacon_interval 5.000000 3. [log] - 2021-04-30T18:23:10.136+0530 7f4e3925c700 1 mon.cephnode2(a)1(leader).mds e776 no beacon from mds.0.771 (gid: 639443 addr: [v2: 10.0.4.10:6800/2172152716,v1:10.0.4.10:6801/2172152716] state: up:active)* since 18.7951* 4. *in Total it takes around 40 seconds to handover and activate passive standby node. * *Query:* 1. Can these variables be configured ? which we have tried,but are not aware of the overall impact on the ceph cluster because of these changes 1. By tuning these values we could reach the minimum time of 12 seconds in which the active node comes up. 2. Values taken to get the said time : 1. *mon_election_timeout* (default 5) - configured as 1 2. *mon_lease*(default 5) - configured as 2 3. *mds_beacon_grace* (default 15) - configured as 5 4. *mds_beacon_interval* (default 5) - configured as 1 We need to tune this setup to get the failover duration as low as 5-7 seconds. Please suggest/support and share your inputs, my setup is ready and already we are testing with multiple scenarios so that we are able to achive min failover duration. -- ~ Lokendra www.inertiaspeaks.com www.inertiagroups.com skype: lokendrarathour

2 years, 11 months

5
11
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2021