April 2021 - ceph-users - lists.ceph.io

by Mark Johnson

Really not sure where to go with this one. Firstly, a description of my cluster. Yes, I know there are a lot of "not ideals" here but this is what I inherited. The cluster is running Jewel and has two storage/mon nodes and an additional mon only node, with a pool size of 2. Today, we had a some power issues in the data centre and we very ungracefully lost both storage servers at the same time. Node 1 came back online before node 2 but I could see there were a few OSDs that were down. When node 2 came back, I started trying to get OSDs up. Each node has 14 OSDs and I managed to get all OSDs up and in on node 2, but one of the OSDs on node 1 keeps starting and crashing and just won't stay up. I'm not finding the OSD log output to be much use. Current health status looks like this: # ceph health HEALTH_ERR 26 pgs are stuck inactive for more than 300 seconds; 26 pgs down; 26 pgs peering; 26 pgs stuck inactive; 26 pgs stuck unclean; 5 requests are blocked > 32 sec # ceph status cluster e2391bbf-15e0-405f-af12-943610cb4909 health HEALTH_ERR 26 pgs are stuck inactive for more than 300 seconds 26 pgs down 26 pgs peering 26 pgs stuck inactive 26 pgs stuck unclean 5 requests are blocked > 32 sec Any clues as to what I should be looking for or what sort of action I should be taking to troubleshoot this? Unfortunately, I'm a complete novice with Ceph. Here's a snippet from the OSD log that means little to me... --- begin dump of recent events --- 0> 2021-04-16 12:25:10.169340 7f2e23921ac0 -1 *** Caught signal (Aborted) ** in thread 7f2e23921ac0 thread_name:ceph-osd ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e) 1: (()+0x9f1c2a) [0x7f2e24330c2a] 2: (()+0xf5d0) [0x7f2e21ee95d0] 3: (gsignal()+0x37) [0x7f2e2049f207] 4: (abort()+0x148) [0x7f2e204a08f8] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f2e2442fd47] 6: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0x90c) [0x7f2e2417bc7c] 7: (JournalingObjectStore::journal_replay(unsigned long)+0x1ee) [0x7f2e240c8dce] 8: (FileStore::mount()+0x3cd6) [0x7f2e240a0546] 9: (OSD::init()+0x27d) [0x7f2e23d5828d] 10: (main()+0x2c18) [0x7f2e23c71088] 11: (__libc_start_main()+0xf5) [0x7f2e2048b3d5] 12: (()+0x3c8847) [0x7f2e23d07847] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Thanks in advance, Mark

3 years, 1 month

2
5
0 0

what-does-nosuchkey-error-mean-while-subscribing-for-notification-in-ceph

by Szabo, Istvan (Agoda)

Hi, I am trying to follow this url https://docs.ceph.com/en/latest/radosgw/s3/bucketops/#create-notification to create a publisher for my bucket into a topic. My curl: curl -v -H 'Date: Fri, 16 Apr 2021 05:21:14 +0000' -H 'Authorization: AWS accessid:secretkey' -L -H 'content-type: text/xml' -H 'Content-MD5: pBRX39Oo7aAUYbilIYMoAw==' -T notif.xml http://ceph:8080/vig-test?notification and it returns me this error <?xml version="1.0" encoding="UTF-8"?> <Error> <Code>NoSuchKey</Code> <BucketName>vig-test</BucketName> <RequestId>tx0000000000000016ac570-0060791ecb-1c7e96b-hkg</RequestId> <HostId>1c7e96b-hkg-data</HostId> </Error> Does anybody know what does this error mean in Ceph? How can I proceed? Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 1 month

3
3
0 0

ceph/ceph-grafana docker image for arm64 missing

by mabi

Hello, I want to deploy a new ceph Octopus cluster using cephadm on arm64 architecture but unfortunately the ceph/ceph-grafana docker image for arm64 is missing. Is this mailing list the right place to report this? or where should I report that? Best regards, Mabi

3 years, 1 month

1
0
1 0

ceph-iscsi issue after upgrading from nautilus to octopus

by icy chan

Hi, I had several clusters running as nautilus and pending upgrading to octopus. I am now testing the upgrade steps for ceph cluster from nautilus to octopus using cephadm adopt in lab referred to below link: - https://docs.ceph.com/en/octopus/cephadm/adoption/ Lab environment: 3 all-in-one nodes. OS: CentOS 7.9.2009 with podman 1.6.4. After the adoption, ceph health keep warns about tcme-runner not managed by cephadm. # ceph health detail HEALTH_WARN 12 stray daemon(s) not managed by cephadm; 1 pool(s) have no replicas configured [WRN] CEPHADM_STRAY_DAEMON: 12 stray daemon(s) not managed by cephadm stray daemon tcmu-runner.ceph-aio1:iSCSI/iscsi_image_01 on host ceph-aio1 not managed by cephadm stray daemon tcmu-runner.ceph-aio1:iSCSI/iscsi_image_02 on host ceph-aio1 not managed by cephadm stray daemon tcmu-runner.ceph-aio1:iSCSI/iscsi_image_03 on host ceph-aio1 not managed by cephadm stray daemon tcmu-runner.ceph-aio1:iSCSI/iscsi_image_test on host ceph-aio1 not managed by cephadm stray daemon tcmu-runner.ceph-aio2:iSCSI/iscsi_image_01 on host ceph-aio2 not managed by cephadm stray daemon tcmu-runner.ceph-aio2:iSCSI/iscsi_image_02 on host ceph-aio2 not managed by cephadm stray daemon tcmu-runner.ceph-aio2:iSCSI/iscsi_image_03 on host ceph-aio2 not managed by cephadm stray daemon tcmu-runner.ceph-aio2:iSCSI/iscsi_image_test on host ceph-aio2 not managed by cephadm stray daemon tcmu-runner.ceph-aio3:iSCSI/iscsi_image_01 on host ceph-aio3 not managed by cephadm stray daemon tcmu-runner.ceph-aio3:iSCSI/iscsi_image_02 on host ceph-aio3 not managed by cephadm stray daemon tcmu-runner.ceph-aio3:iSCSI/iscsi_image_03 on host ceph-aio3 not managed by cephadm stray daemon tcmu-runner.ceph-aio3:iSCSI/iscsi_image_test on host ceph-aio3 not managed by cephadm And tcmu-runner is still running with the old version. # ceph versions { "mon": { "ceph version 15.2.10 (27917a557cca91e4da407489bbaa64ad4352cc02) octopus (stable)": 3 }, "mgr": { "ceph version 15.2.10 (27917a557cca91e4da407489bbaa64ad4352cc02) octopus (stable)": 1 }, "osd": { "ceph version 15.2.10 (27917a557cca91e4da407489bbaa64ad4352cc02) octopus (stable)": 9 }, "mds": {}, "tcmu-runner": { "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 12 }, "overall": { "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 12, "ceph version 15.2.10 (27917a557cca91e4da407489bbaa64ad4352cc02) octopus (stable)": 13 } } I didn't find any ceph-iscsi related upgrade steps from the above reference link. Can anyone here point me to the right direction of ceph-iscsi version upgrade? Thanks. Regs, Icy

3 years, 1 month

1
0
0 0

Fresh install of Ceph using Ansible

by Jared Jacob

I am looking to rebuild my ceph cluster using ansible.What is the best way to start this process?

3 years, 1 month

2
1
0 0

How to handle bluestore fragmentation

by David Caro

Reading the thread "s3 requires twice the space it should use", Boris pointed out that the fragmentation for the osds is around 0.8-0.9: > On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens <bb(a)kervyn.de> wrote: >> I also checked the fragmentation on the bluestore OSDs and it is around >> 0.80 - 0.89 on most OSDs. yikes. >> [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block >> { >> "fragmentation_rating": 0.85906054329923576 >> } And that made me wonder what is the current recommended (and not recommended) way to handle and reduce the fragmentation of the existing OSDs. Reading around I would think of tweaking the min_alloc_size_{ssd,hdd} and redeploying those OSDs, but I was unable to find much else, I wonder what do people do? ps. There was another thread that got no replies asking something similar (and a bunch of other things): https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/3PITWZRNX7…

3 years, 1 month

1
0
0 0

DocuBetter Meeting This Week -- 1630 UTC

by John Zachary Dover

This week's meeting will focus on the ongoing rewrite of the cephadm documentation and the upcoming Google Season of Docs project. Meeting: https://bluejeans.com/908675367 Etherpad: https://pad.ceph.com/p/Ceph_Documentation

3 years, 1 month

2
1
0 0

Re: s3 requires twice the space it should use

by Boris Behrens

So, I need to live with it? A value of zero leads to use the default? [root@s3db1 ~]# ceph daemon osd.23 config get bluestore_min_alloc_size { "bluestore_min_alloc_size": "0" } I also checked the fragmentation on the bluestore OSDs and it is around 0.80 - 0.89 on most OSDs. yikes. [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block { "fragmentation_rating": 0.85906054329923576 } The problem I currently have is, that I barely keep up with adding OSD disks. Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge <amitg.b14(a)gmail.com >: > size_kb_actual are actually bucket object size but on OSD level the > bluestore_min_alloc_size default 64KB and SSD are 16KB > > > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/a… > > -AmitG > > On Thu, Apr 15, 2021 at 7:29 PM Boris Behrens <bb(a)kervyn.de> wrote: > >> Hi, >> >> maybe it is just a problem in my understanding, but it looks like our s3 >> requires twice the space it should use. >> >> I ran "radosgw-admin bucket stats", and added all "size_kb_actual" values >> up and divided to TB (/1024/1024/1024). >> The resulting space is 135,1636733 TB. When I tripple it because of >> replication I end up with around 405TB which is nearly half the space of >> what ceph df tells me. >> >> Hope someone can help me. >> >> ceph df shows >> RAW STORAGE: >> CLASS SIZE AVAIL USED RAW USED %RAW USED >> hdd 1009 TiB 189 TiB 820 TiB 820 TiB 81.26 >> TOTAL 1009 TiB 189 TiB 820 TiB 820 TiB 81.26 >> >> POOLS: >> POOL ID PGS STORED >> OBJECTS >> USED %USED MAX AVAIL >> rbd 0 64 0 B >> 0 >> 0 B 0 18 TiB >> .rgw.root 1 64 99 KiB >> 119 >> 99 KiB 0 18 TiB >> eu-central-1.rgw.control 2 64 0 B >> 8 >> 0 B 0 18 TiB >> eu-central-1.rgw.data.root 3 64 1.0 MiB >> 3.15k >> 1.0 MiB 0 18 TiB >> eu-central-1.rgw.gc 4 64 71 MiB >> 32 >> 71 MiB 0 18 TiB >> eu-central-1.rgw.log 5 64 267 MiB >> 564 >> 267 MiB 0 18 TiB >> eu-central-1.rgw.users.uid 6 64 2.8 MiB >> 6.91k >> 2.8 MiB 0 18 TiB >> eu-central-1.rgw.users.keys 7 64 263 KiB >> 6.73k >> 263 KiB 0 18 TiB >> eu-central-1.rgw.meta 8 64 384 KiB >> 1k >> 384 KiB 0 18 TiB >> eu-central-1.rgw.users.email 9 64 40 B >> 1 >> 40 B 0 18 TiB >> eu-central-1.rgw.buckets.index 10 64 10 GiB >> 67.61k >> 10 GiB 0.02 18 TiB >> eu-central-1.rgw.buckets.data 11 2048 264 TiB >> 138.31M >> 264 TiB 83.37 18 TiB >> eu-central-1.rgw.buckets.non-ec 12 64 297 MiB >> 11.32k >> 297 MiB 0 18 TiB >> eu-central-1.rgw.usage 13 64 536 MiB >> 32 >> 536 MiB 0 18 TiB >> eu-msg-1.rgw.control 56 64 0 B >> 8 >> 0 B 0 18 TiB >> eu-msg-1.rgw.data.root 57 64 72 KiB >> 227 >> 72 KiB 0 18 TiB >> eu-msg-1.rgw.gc 58 64 300 KiB >> 32 >> 300 KiB 0 18 TiB >> eu-msg-1.rgw.log 59 64 835 KiB >> 242 >> 835 KiB 0 18 TiB >> eu-msg-1.rgw.users.uid 60 64 56 KiB >> 104 >> 56 KiB 0 18 TiB >> eu-msg-1.rgw.usage 61 64 37 MiB >> 25 >> 37 MiB 0 18 TiB >> eu-msg-1.rgw.users.keys 62 64 3.8 KiB >> 97 >> 3.8 KiB 0 18 TiB >> eu-msg-1.rgw.meta 63 64 607 KiB >> 1.60k >> 607 KiB 0 18 TiB >> eu-msg-1.rgw.buckets.index 64 64 71 MiB >> 119 >> 71 MiB 0 18 TiB >> eu-msg-1.rgw.users.email 65 64 0 B >> 0 >> 0 B 0 18 TiB >> eu-msg-1.rgw.buckets.data 66 64 2.9 TiB >> 1.16M >> 2.9 TiB 5.30 18 TiB >> eu-msg-1.rgw.buckets.non-ec 67 64 2.2 MiB >> 354 >> 2.2 MiB 0 18 TiB >> default.rgw.control 69 32 0 B >> 8 >> 0 B 0 18 TiB >> default.rgw.data.root 70 32 0 B >> 0 >> 0 B 0 18 TiB >> default.rgw.gc 71 32 0 B >> 0 >> 0 B 0 18 TiB >> default.rgw.log 72 32 0 B >> 0 >> 0 B 0 18 TiB >> default.rgw.users.uid 73 32 0 B >> 0 >> 0 B 0 18 TiB >> fra-1.rgw.control 74 32 0 B >> 8 >> 0 B 0 18 TiB >> fra-1.rgw.meta 75 32 0 B >> 0 >> 0 B 0 18 TiB >> fra-1.rgw.log 76 32 50 B >> 28 >> 50 B 0 18 TiB >> >> >> -- >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> groÃƒ¼en Saal. >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groÃƒ¼en Saal.

3 years, 1 month

1
0
0 0

s3 requires twice the space it should use

by Boris Behrens

Hi, maybe it is just a problem in my understanding, but it looks like our s3 requires twice the space it should use. I ran "radosgw-admin bucket stats", and added all "size_kb_actual" values up and divided to TB (/1024/1024/1024). The resulting space is 135,1636733 TB. When I tripple it because of replication I end up with around 405TB which is nearly half the space of what ceph df tells me. Hope someone can help me. ceph df shows RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 1009 TiB 189 TiB 820 TiB 820 TiB 81.26 TOTAL 1009 TiB 189 TiB 820 TiB 820 TiB 81.26 POOLS: POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL rbd 0 64 0 B 0 0 B 0 18 TiB .rgw.root 1 64 99 KiB 119 99 KiB 0 18 TiB eu-central-1.rgw.control 2 64 0 B 8 0 B 0 18 TiB eu-central-1.rgw.data.root 3 64 1.0 MiB 3.15k 1.0 MiB 0 18 TiB eu-central-1.rgw.gc 4 64 71 MiB 32 71 MiB 0 18 TiB eu-central-1.rgw.log 5 64 267 MiB 564 267 MiB 0 18 TiB eu-central-1.rgw.users.uid 6 64 2.8 MiB 6.91k 2.8 MiB 0 18 TiB eu-central-1.rgw.users.keys 7 64 263 KiB 6.73k 263 KiB 0 18 TiB eu-central-1.rgw.meta 8 64 384 KiB 1k 384 KiB 0 18 TiB eu-central-1.rgw.users.email 9 64 40 B 1 40 B 0 18 TiB eu-central-1.rgw.buckets.index 10 64 10 GiB 67.61k 10 GiB 0.02 18 TiB eu-central-1.rgw.buckets.data 11 2048 264 TiB 138.31M 264 TiB 83.37 18 TiB eu-central-1.rgw.buckets.non-ec 12 64 297 MiB 11.32k 297 MiB 0 18 TiB eu-central-1.rgw.usage 13 64 536 MiB 32 536 MiB 0 18 TiB eu-msg-1.rgw.control 56 64 0 B 8 0 B 0 18 TiB eu-msg-1.rgw.data.root 57 64 72 KiB 227 72 KiB 0 18 TiB eu-msg-1.rgw.gc 58 64 300 KiB 32 300 KiB 0 18 TiB eu-msg-1.rgw.log 59 64 835 KiB 242 835 KiB 0 18 TiB eu-msg-1.rgw.users.uid 60 64 56 KiB 104 56 KiB 0 18 TiB eu-msg-1.rgw.usage 61 64 37 MiB 25 37 MiB 0 18 TiB eu-msg-1.rgw.users.keys 62 64 3.8 KiB 97 3.8 KiB 0 18 TiB eu-msg-1.rgw.meta 63 64 607 KiB 1.60k 607 KiB 0 18 TiB eu-msg-1.rgw.buckets.index 64 64 71 MiB 119 71 MiB 0 18 TiB eu-msg-1.rgw.users.email 65 64 0 B 0 0 B 0 18 TiB eu-msg-1.rgw.buckets.data 66 64 2.9 TiB 1.16M 2.9 TiB 5.30 18 TiB eu-msg-1.rgw.buckets.non-ec 67 64 2.2 MiB 354 2.2 MiB 0 18 TiB default.rgw.control 69 32 0 B 8 0 B 0 18 TiB default.rgw.data.root 70 32 0 B 0 0 B 0 18 TiB default.rgw.gc 71 32 0 B 0 0 B 0 18 TiB default.rgw.log 72 32 0 B 0 0 B 0 18 TiB default.rgw.users.uid 73 32 0 B 0 0 B 0 18 TiB fra-1.rgw.control 74 32 0 B 8 0 B 0 18 TiB fra-1.rgw.meta 75 32 0 B 0 0 B 0 18 TiB fra-1.rgw.log 76 32 50 B 28 50 B 0 18 TiB -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groÃƒ¼en Saal.

3 years, 1 month

1
0
0 0

Cephadm upgrade to Pacific problem

by Radoslav Milanov

Hello, Cluster is 3 nodes Debian 10. Started cephadm upgrade on healthy 15.2.10 cluster. Managers were upgraded fine then first monitor went down for upgrade and never came back. Researching at the unit files container fails to run because of an error: root@host1:/var/lib/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6/mon.host1# cat unit.run set -e /usr/bin/install -d -m0770 -o 167 -g 167 /var/run/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6 # mon.host1 ! /usr/bin/docker rm -f ceph-97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6-mon.host1 2> /dev/null /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph-mon --privileged --group-add=disk --init --name ceph-97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6-mon.host1 -e CONTAINER_IMAGE=ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a -e NODE_NAME=host1 -e CEPH_USE_RANDOM_NONCE=1 -v /var/run/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6:/var/run/ceph:z -v /var/log/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6:/var/log/ceph:z -v /var/lib/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6/mon.host1:/var/lib/ceph/mon/ceph-host1:z -v /var/lib/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6/mon.host1/config:/etc/ceph/ceph.conf:z -v /dev:/dev -v /run/udev:/run/udev ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a -n mon.host1 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true '--default-log-stderr-prefix=debug ' --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true root@host1:/var/lib/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6/mon.host1# /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph-mon --privileged --group-add=disk --init --name ceph-97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6-mon.host1 -e CONTAINER_IMAGE=ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a -e NODE_NAME=host1 -e CEPH_USE_RANDOM_NONCE=1 -v /var/run/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6:/var/run/ceph:z -v /var/log/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6:/var/log/ceph:z -v /var/lib/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6/mon.host1:/var/lib/ceph/mon/ceph-host1:z -v /var/lib/ceph/97d9f40e-9d33-11eb-8e3f-1c34da4b9fb6/mon.host1/config:/etc/ceph/ceph.conf:z -v /dev:/dev -v /run/udev:/run/udev ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a -n mon.host1 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true '--default-log-stderr-prefix=debug ' --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true /usr/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "exec: \"/dev/init\": stat /dev/init: no such file or directory": unknown. Any suggestions how to resolve that ? Thank you.

3 years, 1 month

3
5
0 0

2024

2023

2022

2021

2020

2019

ceph-users April 2021