April 2023 - ceph-users - lists.ceph.io

by E Taka

I'm using a dockerized Ceph 17.2.6 under Ubuntu 22.04. Presumably I'm missing a very basic thing, since this seems a very simple question: how can I call cephfs-top in my environment? It is not inckuded in the Docker Image which is accessed by "cephadm shell". And calling the version found in the source code always fails with "[errno 13] RADOS permission denied", even when using "--cluster" with the correct ID, "--conffile" and "--id". The auth user client.fstop exists, and "ceph fs perf stats" runs. What am I missing? Thanks!

1 year

2
1
0 0

Ceph stretch mode / POOL_BACKFILLFULL

by Kilian Ries

Hi, we run a ceph cluster in stretch mode with one pool. We know about this bug: https://tracker.ceph.com/issues/56650 https://github.com/ceph/ceph/pull/47189 Can anyone tell me what happens when a pool gets to 100% full? At the moment raw OSD usage is about 54% but ceph throws me a "POOL_BACKFILLFULL" error: $ ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 63 TiB 29 TiB 34 TiB 34 TiB 54.19 TOTAL 63 TiB 29 TiB 34 TiB 34 TiB 54.19 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL .mgr 1 1 415 MiB 105 1.2 GiB 0.04 1.1 TiB vm_stretch_live 2 64 15 TiB 4.02M 34 TiB 95.53 406 GiB So the pool warning / calculation is just a bug, because it thinks its 50% of the total size. I know ceph will stop IO / set OSDs to read only if the hit a "backfillfull_ratio" ... but what will happen if the pool gets to 100% full ? Will IO still be possible? No limits / quotas are set on the pool ... Thanks Regards, Kilian

1 year

2
2
0 0

Ceph 16.2.12, particular OSD shows higher latency than others

by Zakhar Kirpichenko

Hi, I have a Ceph 16.2.12 cluster with uniform hardware, same drive make/model, etc. A particular OSD is showing higher latency than usual in `ceph osd perf`, usually mid to high tens of milliseconds while other OSDs show low single digits, although its drive's I/O stats don't look different from those of other drives. The workload is mainly random 4K reads and writes, the cluster is being used as Openstack VM storage. Is there a way to trace, which particular PG, pool and disk image or object cause this OSD's excessive latency? Is there a way to tell Ceph to I would appreciate any advice or pointers. Best regards, Zakhar

1 year

3
9
1 0

Massive OMAP remediation

by Ben.Zieglmeier

Hi All, We have a RGW cluster running Luminous (12.2.11) that has one object with an extremely large OMAP database in the index pool. Listomapkeys on the object returned 390 Million keys to start. Through bilog trim commands, we’ve whittled that down to about 360 Million. This is a bucket index for a regrettably unsharded bucket. There are only about 37K objects actually in the bucket, but through years of neglect, the bilog grown completely out of control. We’ve hit some major problems trying to deal with this particular OMAP object. We just crashed 4 OSDs when a bilog trim caused enough churn to knock one of the OSDs housing this PG out of the cluster temporarily. The OSD disks are 6.4TB NVMe, but are split into 4 partitions, each housing their own OSD daemon (collocated journal). We want to be rid of this large OMAP object, but are running out of options to deal with it. Reshard outright does not seem like a viable option, as we believe the deletion would deadlock OSDs can could cause impact. Continuing to run `bilog trim` 1000 records at a time has been what we’ve done, but this also seems to be creating impacts to performance/stability. We are seeking options to remove this problematic object without creating additional problems. It is quite likely this bucket is abandoned, so we could remove the data, but I fear even the deletion of such a large OMAP could bring OSDs down and cause potential for metadata loss (the other bucket indexes on that same PG). Any insight available would be highly appreciated. Thanks.

1 year

3
3
0 0

Bucket notification

by Szabo, Istvan (Agoda)

Hi, I'm trying to set a kafka endpoint for bucket object create operation notifications but the notification is not created in kafka endpoint. Settings seems to be fine because I can upload to the bucket objects when these settings are applied: NotificationConfiguration> <TopicConfiguration> <Id>bulknotif</Id> <Topic>arn:aws:sns:default::butcen</Topic> <Event>s3:ObjectCreated:*</Event> <Event>s3:ObjectRemoved:*</Event> </TopicConfiguration> </NotificationConfiguration> but it simply not created any message in kafka. This is my topic creation post request: https://xxx.local/? Action=CreateTopic& Name=butcen& kafka-ack-level=broker& use-ssl=true& push-endpoint=kafka://ceph:pw@xxx.local:9093 Am I missing something or it's definitely kafka issue? Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

1 year

2
2
0 0

Bucket empty after resharding on multisite environment

by Boris Behrens

Hi, I just resharded a bucket on an octopus multisite environment from 11 to 101. I did it on the master zone and it went through very fast. But now the index is empty. The files are still there when doing a radosgw-admin bucket radoslist --bucket-id Do I just need to wait or do I need to recover that somehow?

1 year

1
2
0 0

MDS recovery

by jack＠translucencebio.com

Hi All, We have a CephFS cluster running Octopus with three control nodes each running an MDS, Monitor, and Manager on Ubuntu 20.04. The OS drive on one of these nodes failed recently and we had to do a fresh install, but made the mistake of installing Ubuntu 22.04 where Octopus is not available. We tried to force apt to use the Ubuntu 20.04 repo when installing Ceph so that it would install Octopus, but for some reason Quincy was still installed. We re-integrated this node and it seemed to work fine for about a week until our cluster reported damage to an MDS daemon and placed our filesystem into a degraded state. cluster: id: 692905c0-f271-4cd8-9e43-1c32ef8abd13 health: HEALTH_ERR mons are allowing insecure global_id reclaim 1 filesystem is degraded 1 filesystem is offline 1 mds daemon damaged noout flag(s) set 161 scrub errors Possible data damage: 24 pgs inconsistent 8 pgs not deep-scrubbed in time 4 pgs not scrubbed in time 6 daemons have recently crashed services: mon: 3 daemons, quorum database-0,file-server,webhost (age 12d) mgr: database-0(active, since 4w), standbys: webhost, file-server mds: cephfs:0/1 3 up:standby, 1 damaged osd: 91 osds: 90 up (since 32h), 90 in (since 5M) flags noout task status: data: pools: 7 pools, 633 pgs objects: 169.18M objects, 640 TiB usage: 883 TiB used, 251 TiB / 1.1 PiB avail pgs: 605 active+clean 23 active+clean+inconsistent 4 active+clean+scrubbing+deep 1 active+clean+scrubbing+deep+inconsistent We are not sure if the Quincy/Octopus version mismatch is the problem, but we are in the process of downgrading this node now to ensure all nodes are running Octopus. Before doing that, we ran the following commands to try and recover: $ cephfs-journal-tool --rank=cephfs:all journal export backup.bin $ sudo cephfs-journal-tool --rank=cephfs:all event recover_dentries summary: Events by type: OPEN: 29589 PURGED: 1 SESSION: 16 SESSIONS: 4 SUBTREEMAP: 127 UPDATE: 70438 Errors: 0 $ cephfs-journal-tool --rank=cephfs:0 journal reset: old journal was 170234219175~232148677 new journal start will be 170469097472 (2729620 bytes past old end) writing journal head writing EResetJournal entry done $ cephfs-table-tool all reset session All of our MDS daemons are down and fail to restart with the following errors: -3> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log [ERR] : journal replay alloc 0x1000053af79 not in free [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2,0x1000053b561~0x2,0x1000053b565~0x1de,0x1000053b938~0x1fd,0x1000053bd2a~0x4,0x1000053bf23~0x4,0x1000053c11c~0x4,0x1000053cd7b~0x158,0x1000053ced8~0xffffac3128] -2> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log [ERR] : journal replay alloc [0x1000053af7a~0x1eb,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2], only [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2] is in free [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2,0x1000053b561~0x2,0x1000053b565~0x1de,0x1000053b938~0x1fd,0x1000053bd2a~0x4,0x1000053bf23~0x4,0x1000053c11c~0x4,0x1000053cd7b~0x158,0x1000053ced8~0xffffac3128] -1> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 /build/ceph-15.2.15/src/mds/journal.cc: In function 'void EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 7f0465069700 time 2023-04-20T10:25:15.076784-0700 /build/ceph-15.2.15/src/mds/journal.cc: 1513: FAILED ceph_assert(inotablev == mds->inotable->get_version()) ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x7f04717a3be1] 2: (()+0x26ade9) [0x7f04717a3de9] 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) [0x560feaca36f2] 4: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2] 5: (MDLog::_replay_thread()+0x90c) [0x560feac393ac] 6: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821] 7: (()+0x8609) [0x7f0471318609] 8: (clone()+0x43) [0x7f0470ee9163] 0> 2023-04-20T10:25:15.076-0700 7f0465069700 -1 *** Caught signal (Aborted) ** in thread 7f0465069700 thread_name:md_log_replay ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable) 1: (()+0x143c0) [0x7f04713243c0] 2: (gsignal()+0xcb) [0x7f0470e0d03b] 3: (abort()+0x12b) [0x7f0470dec859] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x7f04717a3c3c] 5: (()+0x26ade9) [0x7f04717a3de9] 6: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) [0x560feaca36f2] 7: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2] 8: (MDLog::_replay_thread()+0x90c) [0x560feac393ac] 9: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821] 10: (()+0x8609) [0x7f0471318609] 11: (clone()+0x43) [0x7f0470ee9163] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. At this point, we decided it's best to ask for some guidance before issuing any other recovery commands. Can anyone advise what we should do?

1 year

2
1
0 0

How to find the bucket name from Radosgw log?

by viplanghe6＠gmail.com

I find a log like this, and I thought the bucket name should be "photos": ....[2023-04-19 15:48:47.0.5541s] "GET /photos/shares/.... But I can not find it: radosgw-admin bucket stats --bucket photos failure: 2023-04-19 15:48:53.969 7f69dce49a80 0 could not get bucket info for bucket=photos (2002) Unknown error 2002 How does this happen? Thanks

1 year

3
2
0 0

Re: Disks are filling up

by Omar Siam

No this is a cephadm setup. Not rook. In the last days it is still deep scrubbing and filling up. We have to do something about it as it now impacts our K8s cluster (very slow cephfs access) and we are running out of (allocated) diskspace again. Some more details now that I had a few more days to think about our particular setup: * This is a setup with ESXi/vSphere virtualization. The ceph nodes are just some VMs. We don't have access to the bare servers or even direct access to the HDDs/SSDs ceph runs on. * The setup is "asymmetric": there are 2 nodes on SSDs and one on HDDs (they are all RAIDx with hardware controllers, but we have no say in this). I labeled all OSDs as HDDs (even when VMWare reported SSD). * We looked at the OSDs device usage and it is 100% (from the VMs point of view) for the HDDs (20% on everage for the SSD nodes). My suspision is: * deep scrubbing means every new write goes to unallocated space, no more overwrite/deleting while deep scrubbing. I didn't find it in the docs. Maybe I missed it, maybe that is common wisdom among the initiated. * We write more new data per second to cephfs than can be scrubbed so scrubbing never ends and the PGs fill up. We now ordered SSDs for the HDD only node to prevent this in the future. Meanwhile we need to do something so we think about moving the data in cephfs to a new PG that does not need deep scrubbing at the moment. Also we think about moving the OSD from the physical host that only has HDDs to one with SSDs ruining redundancy for a short while and hoping for the best Am 26.04.2023 um 02:28 schrieb A Asraoui: > > Omar, glad to see cephfs with kubernetes up and running.. did you guys > use rook to deploy this ?? > > Abdelillah > On Mon, Apr 24, 2023 at 6:56 AM Omar Siam <Omar.Siam(a)oeaw.ac.at> wrote: > > Hi list, > > we created a cluster for using cephfs with a kubernetes cluster. > Since a > few weeks now the cluster keeps filling up at an alarming rate > (100 GB per day). > This is while the most relevant pg is deep scrubbing and was > interupted > a few times. > > We use about 150G (du using the mounted filesystem) on the cephfs > filesystem and try not to use snapshots (.snap directories "exist" > but > are empty). > We do not understand why the pgs get bigger and bigger while cephfs > stays about the same size (overwrites on files certainly happen). > I suspect some snapshots mechanism. Any ideas how to debug this to > stop it? > > Maybe we should try to speed up the deep scrubbing somehow? > Best regards -- Mag. Ing. Omar Siam Austrian Center for Digital Humanities and Cultural Heritage Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences Stellvertretende Behindertenvertrauensperson | Deputy representative for disabled persons Bäckerstraße 13, 1010 Wien, Österreich | Vienna, Austria T: +43 1 51581-7295 omar.siam(a)oeaw.ac.at |www.oeaw.ac.at/acdh

1 year

1
0
0 0

Veeam backups to radosgw seem to be very slow

by Boris Behrens

We have a customer that tries to use veeam with our rgw objectstorage and it seems to be blazingly slow. What also seems to be strange, that veeam sometimes show "bucket does not exist" or "permission denied". I've tested parallel and everything seems to work fine from the s3cmd/aws cli standpoint. Does anyone here ever experienced veeam problems with rgw? Cheers Boris

1 year

5
5
0 0

2024

2023

2022

2021

2020

2019

ceph-users April 2023