I'm still trying to understand how the manager and dashboard connect to different object gateway, and I don’t really understand how it works.
Initially, I wanted to have each gateway listen only to localhost, on http:
[client.radosgw.<%= $id%>]
rgw_frontends = beast endpoint=127.0.0.1:9080
It fails, but in a strange way. It was indeed connecting to the port 9080, so the mgr reads that line, but using the host IP.
I can easily understand why binding only to 127.0.0.1 was a bad idea, but I don't understand how it choose the IP to connect to.
When looking at the configuration using the dasbhoard, I see a field called hostname. This value seems to be read-only, as set it it in ceph.conf to a obvious wrong value (www.google.com <http://www.google.com/>) does not change it.
Is it that field that is used ? It's important because if I want to have SSL verification, I must know the value: IP, hostname, FQDN ? that I will put in the Subject Alternative Name of the certificate.
Hello,
I am playing around with a test ceph 14.2.20 cluster. The cluster
consists of 4 VMs, each VM has 2 OSDs. The first three VMs vceph1,
vceph2 and vceph3 are monitors. vceph1 is also mgr.
What I did was quite simple. The cluster is in the state HEALTHY:
vceph2: systemctl stop ceph-osd@2
# let ceph repair until ceph -s reports cluster is healthy again
vceph2: systemctl start ceph-osd@2 # @ 15:39:15, for the logs
# cluster reports in cephs -s that 8 OSDs are up and in, then
# starts rebalance osd.2
vceph2: ceph -s # hangs forever also if executed on vceph3 or 4
# mon on vceph1 eats 100% CPU permanently, the other mons ~0 %CPU
vceph1: systemctl stop ceph-mon@vceph1 # wait ~30 sec to terminate
vceph1: systemctl start ceph-mon@vceph1 # Everything is OK again
I posted the mon-log to: https://cloud.uni-koblenz.de/s/t8tWjWFAobZb5Hy
Strange enough if I set "debug mon 20" before starting the experiment
this bug does not show up. I also tried the very same procedure on the
same cluster updated to 15.2.11 but I was unable to reproduce this bug
in this ceph version.
Thanks
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287
1001312
Once activated the dashboard, I try to import certificates, but it fails:
$ ceph dashboard set-ssl-certificate-key -i /data/ceph/conf/ceph.key
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1337, in _handle_command
return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 389, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/dashboard/module.py", line 385, in set_ssl_certificate_key
self.set_store('key', inbuf.decode())
AttributeError: 'str' object has no attribute 'decode'
$ ceph dashboard set-ssl-certificate -i /data/ceph/conf/ceph.crt
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1337, in _handle_command
return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 389, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/dashboard/module.py", line 372, in set_ssl_certificate
self.set_store('crt', inbuf.decode())
AttributeError: 'str' object has no attribute 'decode'
They are both PEM encoded files:
file /data/ceph/conf/ceph.key /data/ceph/conf/ceph.crt
/data/ceph/conf/ceph.key: PEM RSA private key
/data/ceph/conf/ceph.crt: PEM certificate
What format does this command expect ?
That error happens on Centos 8.3.2011 with ceph-mgr-16.2.1-0.el8.x86_64, downloaded directly from ceph.
Hello,
I have a small Octopus cluster (3 mon/mgr nodes, 3 osd nodes) installed with cephadm and hence running inside podman containers on Ubuntu 20.04.
I want to use CephFS so I created a fs volume and saw that two MDS containers have been automatically deployed on two of my OSD nodes. Now I saw in the documentation (https://docs.ceph.com/en/latest/cephfs/) that the MDS first writes to a journal before writing into the meta pool so I was wondering where exactly is this MDS journal located?
Because my MDS service is running inside a container, does this mean the journal gets written inside the container itself?
I am asking because my containers are running from the OS hard disk which is not as performant as the OSD hard disk. So if this is the case it would be better for performance to have a dedicated hard disk for MDS? is this correct?
And if yes how do I specify to cephadm that my MDS container should be writing its journal at another location?
Finally I read (https://docs.ceph.com/en/latest/start/hardware-recommendations/) that the Ceph MDS service only needs 1 MB storage for its journal, is that all?
Best regards,
Mabi
Hello All,I was running 15.2.8 via cephadm on docker Ubuntu 20.04I just attempted to upgrade to 16.2.1 via the automated method, it successfully upgraded the mon/mgr/mds and some OSD's, however it then failed on an OSD and hasn't been able to pass even after stopping and restarting the upgrade.It reported the following ""message": "Error: UPGRADEREDEPLOYDAEMON: Upgrading daemon osd.35 on host sn-s01 failed.""If I run 'ceph health detail' I get lot's of the following error : "ValueError: not enough values to unpack (expected 2, got 1)" throughout the detail reportUpon googling, it looks like I am hitting something along the lines of https://158.69.68.89/issues/48924 & https://tracker.ceph.com/issues/49522What do I need to do to either get around this bug, or a way I can manually upgrade the remaining ceph OSD's to 16.2.1, currently my cluster is working but the last OSD it failed to upgrade is currently offline (I guess as no image attached to it now as it failed to pull it), and I have a cluster with OSD's from not 15.2.8 and 16.2.1Thanks
Sent via MXlogin
Hi there,
I think I found a bug in the radosgw-admin bucket radoslist command. I'm not 100% sure so would like to check here first before I fill a bug report.
I have a bucket called bucket3. If I do a multipart upload and stop it halfway for example and start a new upload with the same name and abort it, I'll get duplicate lines in the radoslist output.
For example:
root@alpha:~# radosgw-admin bucket radoslist --bucket bucket3
root@alpha:~#
root@alpha:~# s3cmd put -P 100MB.bin s3://bucket3/multipart-obj-fail
upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 1 of 7, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 35.14 MB/s done
upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 2 of 7, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 33.59 MB/s done
upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 3 of 7, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 33.88 MB/s done
upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 4 of 7, 15MB] [1 of 1]
65536 of 15728640 0% in 0s 806.10 KB/s^CERROR:
Upload of '100MB.bin' part 4 failed. Use
/usr/bin/s3cmd abortmp s3://bucket3/multipart-obj-fail 2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX
to abort the upload, or
/usr/bin/s3cmd --upload-id 2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX put ...
to continue the
If I now run a radoslist it looks fine:
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_3
However, if I do a second upload with the same name:
root@alpha:~# s3cmd put -P 100MB.bin s3://bucket3/multipart-obj-fail
upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 1 of 7, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 23.18 MB/s done
upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 2 of 7, 15MB] [1 of 1]
15728640 of 15728640 100% in 1s 13.10 MB/s done
upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 3 of 7, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 29.93 MB/s done
upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 4 of 7, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 39.53 MB/s done
upload: '100MB.bin' -> 's3://bucket3/multipart-obj-fail' [part 5 of 7, 15MB] [1 of 1]
^CERROR:
Upload of '100MB.bin' part 5 failed. Use
/usr/bin/s3cmd abortmp s3://bucket3/multipart-obj-fail 2~yS7Tzru_FSP6rkg4yeO28os207nDtTw
to abort the upload, or
/usr/bin/s3cmd --upload-id 2~yS7Tzru_FSP6rkg4yeO28os207nDtTw put ...
to continue the upload.
See ya!
Note that the IDs are unique, run1: 2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX run2: 2~yS7Tzru_FSP6rkg4yeO28os207nDtTw
But if we look at the radoslist output again:
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1_3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2_3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3_3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4_3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1_3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2_3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3_3
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4_1
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4_2
646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__shadow_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4_3
the duplicates are(output from radoslist | sort | uniq -c):
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4
If I again do a new upload and stop it I get a new upload ID 2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX but again with duplicate entries(output from radoslist | sort | uniq -c):
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.1
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.2
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.3
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~0GMvfYOGO5yhFppCWjYrUqBAOqQscoX.4
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~iNQha2hAznnKdSLukUAbJT1-4nXoWUy.1
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~iNQha2hAznnKdSLukUAbJT1-4nXoWUy.2
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~iNQha2hAznnKdSLukUAbJT1-4nXoWUy.3
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~iNQha2hAznnKdSLukUAbJT1-4nXoWUy.4
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.1
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.2
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.3
2 646e346c-2355-49df-973f-d8ac2c6349f9.74148.1__multipart_multipart-obj-fail.2~yS7Tzru_FSP6rkg4yeO28os207nDtTw.4
I'm running Ceph Octopus deployed by cephadm:
root@alpha:~# ceph versions
{
"overall": {
"ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 12
}
}
Is this expected behaviour or should I make a bug report for this?
Kind regards,
Rob
Hi all,
this is a follow-up on "reboot breaks OSDs converted from ceph-disk to ceph-volume simple".
I converted a number of ceph-disk OSDs to ceph-volume using "simple scan" and "simple activate". Somewhere along the way, the OSDs meta-data gets rigged and the prominent symptom is that the symlink block is changes from a part-uuid target to an unstable device name target like:
before conversion:
block -> /dev/disk/by-partuuid/9123be91-7620-495a-a9b7-cc85b1de24b7
after conversion:
block -> /dev/sdj2
This is a huge problem as the "after conversion" device names are unstable. I have now a cluster that I cannot reboot servers on due to this problem. OSDs randomly re-assigned devices will refuse to start with:
2021-03-02 15:56:21.709 7fb7c2549b80 -1 OSD id 241 != my id 248
Please help me with getting out of this mess.
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi,
I’m currently deploying a new cluster for cold storage with rgw.
Is there actually a more elegant method to get the bucket data on an erasure coding pool other than moving the pool or creating the bucket.data pool prior to data upload?
Thanks,
Marco Savoca
Does anyone else receive unsolicited replies from sender "Chip Cox <chip(a)softiron.com>" to e-mails posted on this list?
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi Team,
I was setting up the ceph cluster with
- Node Details:3 Mon,2 MDS, 2 Mgr, 2 RGW
- Deployment Type: Active Standby
- Testing Mode: Failover of MDS Node
- Setup : Octopus (15.2.7)
- OS: centos 8.3
- hardware: HP
- Ram: 128 GB on each Node
- OSD: 2 ( 1 tb each)
- Operation: Normal I/O with mkdir on every 1 second.
T*est Case: Power-off any active MDS Node for failover to happen*
*Observation:*
We have observed that whenever an active MDS Node is down it takes around*
40 seconds* to activate the standby MDS Node.
on further checking the logs for the new-handover MDS Node we have seen
delay on the basis of following inputs:
1. 10 second delay after which Mon calls for new Monitor election
1. [log] 0 log_channel(cluster) log [INF] : mon.cephnode1 calling
monitor election
2. 5 second delay in which newly elected Monitor is elected
1. [log] 0 log_channel(cluster) log [INF] : mon.cephnode1 is new
leader, mons cephnode1,cephnode3 in quorum (ranks 0,2)
3. the addition beacon grace time for which the system waits before
which it enables standby MDS node activation. (approx delay of 19 seconds)
1. defaults : sudo ceph config get mon mds_beacon_grace
15.000000
2. sudo ceph config get mon mds_beacon_interval
5.000000
3. [log] - 2021-04-30T18:23:10.136+0530 7f4e3925c700 1
mon.cephnode2(a)1(leader).mds e776 no beacon from mds.0.771 (gid:
639443 addr: [v2:
10.0.4.10:6800/2172152716,v1:10.0.4.10:6801/2172152716] state:
up:active)* since 18.7951*
4. *in Total it takes around 40 seconds to handover and activate passive
standby node. *
*Query:*
1. Can these variables be configured ? which we have tried,but are not
aware of the overall impact on the ceph cluster because of these changes
1. By tuning these values we could reach the minimum time of 12
seconds in which the active node comes up.
2. Values taken to get the said time :
1. *mon_election_timeout* (default 5) - configured as 1
2. *mon_lease*(default 5) - configured as 2
3. *mds_beacon_grace* (default 15) - configured as 5
4. *mds_beacon_interval* (default 5) - configured as 1
We need to tune this setup to get the failover duration as low as 5-7
seconds.
Please suggest/support and share your inputs, my setup is ready and already
we are testing with multiple scenarios so that we are able to achive min
failover duration.
--
~ Lokendra
www.inertiaspeaks.comwww.inertiagroups.com
skype: lokendrarathour