I'm using a dockerized Ceph 17.2.6 under Ubuntu 22.04.
Presumably I'm missing a very basic thing, since this seems a very simple
question: how can I call cephfs-top in my environment? It is not inckuded
in the Docker Image which is accessed by "cephadm shell".
And calling the version found in the source code always fails with "[errno
13] RADOS permission denied", even when using "--cluster" with the correct
ID, "--conffile" and "--id".
The auth user client.fstop exists, and "ceph fs perf stats" runs.
What am I missing?
Thanks!
Hi,
we run a ceph cluster in stretch mode with one pool. We know about this bug:
https://tracker.ceph.com/issues/56650https://github.com/ceph/ceph/pull/47189
Can anyone tell me what happens when a pool gets to 100% full? At the moment raw OSD usage is about 54% but ceph throws me a "POOL_BACKFILLFULL" error:
$ ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 63 TiB 29 TiB 34 TiB 34 TiB 54.19
TOTAL 63 TiB 29 TiB 34 TiB 34 TiB 54.19
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 415 MiB 105 1.2 GiB 0.04 1.1 TiB
vm_stretch_live 2 64 15 TiB 4.02M 34 TiB 95.53 406 GiB
So the pool warning / calculation is just a bug, because it thinks its 50% of the total size. I know ceph will stop IO / set OSDs to read only if the hit a "backfillfull_ratio" ... but what will happen if the pool gets to 100% full ?
Will IO still be possible?
No limits / quotas are set on the pool ...
Thanks
Regards,
Kilian
Hi,
I have a Ceph 16.2.12 cluster with uniform hardware, same drive make/model,
etc. A particular OSD is showing higher latency than usual in `ceph osd
perf`, usually mid to high tens of milliseconds while other OSDs show low
single digits, although its drive's I/O stats don't look different from
those of other drives. The workload is mainly random 4K reads and writes,
the cluster is being used as Openstack VM storage.
Is there a way to trace, which particular PG, pool and disk image or object
cause this OSD's excessive latency? Is there a way to tell Ceph to
I would appreciate any advice or pointers.
Best regards,
Zakhar
Hi All,
We have a RGW cluster running Luminous (12.2.11) that has one object with an extremely large OMAP database in the index pool. Listomapkeys on the object returned 390 Million keys to start. Through bilog trim commands, we’ve whittled that down to about 360 Million. This is a bucket index for a regrettably unsharded bucket. There are only about 37K objects actually in the bucket, but through years of neglect, the bilog grown completely out of control. We’ve hit some major problems trying to deal with this particular OMAP object. We just crashed 4 OSDs when a bilog trim caused enough churn to knock one of the OSDs housing this PG out of the cluster temporarily. The OSD disks are 6.4TB NVMe, but are split into 4 partitions, each housing their own OSD daemon (collocated journal).
We want to be rid of this large OMAP object, but are running out of options to deal with it. Reshard outright does not seem like a viable option, as we believe the deletion would deadlock OSDs can could cause impact. Continuing to run `bilog trim` 1000 records at a time has been what we’ve done, but this also seems to be creating impacts to performance/stability. We are seeking options to remove this problematic object without creating additional problems. It is quite likely this bucket is abandoned, so we could remove the data, but I fear even the deletion of such a large OMAP could bring OSDs down and cause potential for metadata loss (the other bucket indexes on that same PG).
Any insight available would be highly appreciated.
Thanks.
Hi,
I'm trying to set a kafka endpoint for bucket object create operation notifications but the notification is not created in kafka endpoint.
Settings seems to be fine because I can upload to the bucket objects when these settings are applied:
NotificationConfiguration>
<TopicConfiguration>
<Id>bulknotif</Id>
<Topic>arn:aws:sns:default::butcen</Topic>
<Event>s3:ObjectCreated:*</Event>
<Event>s3:ObjectRemoved:*</Event>
</TopicConfiguration>
</NotificationConfiguration>
but it simply not created any message in kafka.
This is my topic creation post request:
https://xxx.local/?
Action=CreateTopic&
Name=butcen&
kafka-ack-level=broker&
use-ssl=true&
push-endpoint=kafka://ceph:pw@xxx.local:9093
Am I missing something or it's definitely kafka issue?
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hi,
I just resharded a bucket on an octopus multisite environment from 11 to
101.
I did it on the master zone and it went through very fast.
But now the index is empty.
The files are still there when doing a radosgw-admin bucket radoslist
--bucket-id
Do I just need to wait or do I need to recover that somehow?
Hi All,
We have a CephFS cluster running Octopus with three control nodes each running an MDS, Monitor, and Manager on Ubuntu 20.04. The OS drive on one of these nodes failed recently and we had to do a fresh install, but made the mistake of installing Ubuntu 22.04 where Octopus is not available. We tried to force apt to use the Ubuntu 20.04 repo when installing Ceph so that it would install Octopus, but for some reason Quincy was still installed. We re-integrated this node and it seemed to work fine for about a week until our cluster reported damage to an MDS daemon and placed our filesystem into a degraded state.
cluster:
id: 692905c0-f271-4cd8-9e43-1c32ef8abd13
health: HEALTH_ERR
mons are allowing insecure global_id reclaim
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
noout flag(s) set
161 scrub errors
Possible data damage: 24 pgs inconsistent
8 pgs not deep-scrubbed in time
4 pgs not scrubbed in time
6 daemons have recently crashed
services:
mon: 3 daemons, quorum database-0,file-server,webhost (age 12d)
mgr: database-0(active, since 4w), standbys: webhost, file-server
mds: cephfs:0/1 3 up:standby, 1 damaged
osd: 91 osds: 90 up (since 32h), 90 in (since 5M)
flags noout
task status:
data:
pools: 7 pools, 633 pgs
objects: 169.18M objects, 640 TiB
usage: 883 TiB used, 251 TiB / 1.1 PiB avail
pgs: 605 active+clean
23 active+clean+inconsistent
4 active+clean+scrubbing+deep
1 active+clean+scrubbing+deep+inconsistent
We are not sure if the Quincy/Octopus version mismatch is the problem, but we are in the process of downgrading this node now to ensure all nodes are running Octopus. Before doing that, we ran the following commands to try and recover:
$ cephfs-journal-tool --rank=cephfs:all journal export backup.bin
$ sudo cephfs-journal-tool --rank=cephfs:all event recover_dentries summary:
Events by type:
OPEN: 29589
PURGED: 1
SESSION: 16
SESSIONS: 4
SUBTREEMAP: 127
UPDATE: 70438
Errors: 0
$ cephfs-journal-tool --rank=cephfs:0 journal reset:
old journal was 170234219175~232148677
new journal start will be 170469097472 (2729620 bytes past old end)
writing journal head
writing EResetJournal entry
done
$ cephfs-table-tool all reset session
All of our MDS daemons are down and fail to restart with the following errors:
-3> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log [ERR] : journal replay alloc 0x1000053af79 not in free [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2,0x1000053b561~0x2,0x1000053b565~0x1de,0x1000053b938~0x1fd,0x1000053bd2a~0x4,0x1000053bf23~0x4,0x1000053c11c~0x4,0x1000053cd7b~0x158,0x1000053ced8~0xffffac3128]
-2> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log [ERR] : journal replay alloc [0x1000053af7a~0x1eb,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2], only [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2] is in free [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2,0x1000053b561~0x2,0x1000053b565~0x1de,0x1000053b938~0x1fd,0x1000053bd2a~0x4,0x1000053bf23~0x4,0x1000053c11c~0x4,0x1000053cd7b~0x158,0x1000053ced8~0xffffac3128]
-1> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 /build/ceph-15.2.15/src/mds/journal.cc: In function 'void EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 7f0465069700 time 2023-04-20T10:25:15.076784-0700
/build/ceph-15.2.15/src/mds/journal.cc: 1513: FAILED ceph_assert(inotablev == mds->inotable->get_version())
ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x7f04717a3be1]
2: (()+0x26ade9) [0x7f04717a3de9]
3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) [0x560feaca36f2]
4: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2]
5: (MDLog::_replay_thread()+0x90c) [0x560feac393ac]
6: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821]
7: (()+0x8609) [0x7f0471318609]
8: (clone()+0x43) [0x7f0470ee9163]
0> 2023-04-20T10:25:15.076-0700 7f0465069700 -1 *** Caught signal (Aborted) **
in thread 7f0465069700 thread_name:md_log_replay
ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)
1: (()+0x143c0) [0x7f04713243c0]
2: (gsignal()+0xcb) [0x7f0470e0d03b]
3: (abort()+0x12b) [0x7f0470dec859]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x7f04717a3c3c]
5: (()+0x26ade9) [0x7f04717a3de9]
6: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) [0x560feaca36f2]
7: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2]
8: (MDLog::_replay_thread()+0x90c) [0x560feac393ac]
9: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821]
10: (()+0x8609) [0x7f0471318609]
11: (clone()+0x43) [0x7f0470ee9163]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
At this point, we decided it's best to ask for some guidance before issuing any other recovery commands.
Can anyone advise what we should do?
I find a log like this, and I thought the bucket name should be "photos":
....[2023-04-19 15:48:47.0.5541s] "GET /photos/shares/....
But I can not find it:
radosgw-admin bucket stats --bucket photos
failure: 2023-04-19 15:48:53.969 7f69dce49a80 0 could not get bucket info for bucket=photos
(2002) Unknown error 2002
How does this happen? Thanks
No this is a cephadm setup. Not rook.
In the last days it is still deep scrubbing and filling up. We have to
do something about it as it now impacts our K8s cluster (very slow
cephfs access) and we are running out of (allocated) diskspace again.
Some more details now that I had a few more days to think about our
particular setup:
* This is a setup with ESXi/vSphere virtualization. The ceph nodes are
just some VMs. We don't have access to the bare servers or even direct
access to the HDDs/SSDs ceph runs on.
* The setup is "asymmetric": there are 2 nodes on SSDs and one on HDDs
(they are all RAIDx with hardware controllers, but we have no say in
this). I labeled all OSDs as HDDs (even when VMWare reported SSD).
* We looked at the OSDs device usage and it is 100% (from the VMs point
of view) for the HDDs (20% on everage for the SSD nodes).
My suspision is:
* deep scrubbing means every new write goes to unallocated space, no
more overwrite/deleting while deep scrubbing. I didn't find it in the
docs. Maybe I missed it, maybe that is common wisdom among the initiated.
* We write more new data per second to cephfs than can be scrubbed so
scrubbing never ends and the PGs fill up.
We now ordered SSDs for the HDD only node to prevent this in the future.
Meanwhile we need to do something so we think about moving the data in
cephfs to a new PG that does not need deep scrubbing at the moment.
Also we think about moving the OSD from the physical host that only has
HDDs to one with SSDs ruining redundancy for a short while and hoping
for the best
Am 26.04.2023 um 02:28 schrieb A Asraoui:
>
> Omar, glad to see cephfs with kubernetes up and running.. did you guys
> use rook to deploy this ??
>
> Abdelillah
> On Mon, Apr 24, 2023 at 6:56 AM Omar Siam <Omar.Siam(a)oeaw.ac.at> wrote:
>
> Hi list,
>
> we created a cluster for using cephfs with a kubernetes cluster.
> Since a
> few weeks now the cluster keeps filling up at an alarming rate
> (100 GB per day).
> This is while the most relevant pg is deep scrubbing and was
> interupted
> a few times.
>
> We use about 150G (du using the mounted filesystem) on the cephfs
> filesystem and try not to use snapshots (.snap directories "exist"
> but
> are empty).
> We do not understand why the pgs get bigger and bigger while cephfs
> stays about the same size (overwrites on files certainly happen).
> I suspect some snapshots mechanism. Any ideas how to debug this to
> stop it?
>
> Maybe we should try to speed up the deep scrubbing somehow?
>
Best regards
--
Mag. Ing. Omar Siam
Austrian Center for Digital Humanities and Cultural Heritage
Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences
Stellvertretende Behindertenvertrauensperson | Deputy representative for disabled persons
Bäckerstraße 13, 1010 Wien, Österreich | Vienna, Austria
T: +43 1 51581-7295
omar.siam(a)oeaw.ac.at |www.oeaw.ac.at/acdh
We have a customer that tries to use veeam with our rgw objectstorage and
it seems to be blazingly slow.
What also seems to be strange, that veeam sometimes show "bucket does not
exist" or "permission denied".
I've tested parallel and everything seems to work fine from the s3cmd/aws
cli standpoint.
Does anyone here ever experienced veeam problems with rgw?
Cheers
Boris