Hi,
There is an operation "radosgw-admin bi purge" that removes all bucket
index objects for one bucket in the rados gateway.
What is the undo operation for this?
After this operation the bucket cannot be listed or removed any more.
Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin
http://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
Hi Ceph Users,
The User + Dev Meeting is happening this *Thursday, March 16th at 10am
EST *(see
extra meeting details below). If you have any topics you'd like to discuss,
please add them to the etherpad:
https://pad.ceph.com/p/ceph-user-dev-monthly-minutes
One of the topics we wish to discuss is whether any users would be willing
to help with early Reef testing after the RC comes out.
Thanks,
Laura Flores
Meeting link:
https://meet.jit.si/ceph-user-dev-monthly
Time conversions:
UTC: Thursday, March 16, 14:00 UTC
Mountain View, CA, US: Thursday, March 16, 7:00 PDT
Phoenix, AZ, US: Thursday, March 16, 7:00 MST
Denver, CO, US: Thursday, March 16, 8:00 MDT
Huntsville, AL, US: Thursday, March 16, 9:00 CDT
Raleigh, NC, US: Thursday, March 16, 10:00 EDT
London, England: Thursday, March 16, 14:00 GMT
Paris, France: Thursday, March 16, 15:00 CET
Helsinki, Finland: Thursday, March 16, 16:00 EET
Tel Aviv, Israel: Thursday, March 16, 16:00 IST
Pune, India: Thursday, March 16, 19:30 IST
Brisbane, Australia: Friday, March 17, 0:00 AEST
Singapore, Asia: Thursday, March 16, 22:00 +08
Auckland, New Zealand: Friday, March 17, 3:00 NZDT
--
Laura Flores
She/Her/Hers
Software Engineer, Ceph Storage <https://ceph.io>
Chicago, IL
lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com>
M: +17087388804
Hi:
I encountered a problem when I install cephadm on Huawei Cloud EulerOS. When enter the following command, it raise an error. What should I do?
>> ./cephadm add-repo --release quincy
<< ERROR: Distro hce version 2.0 not supported
Hi,
Doing some lab tests to understand why ceph isnt working for us,
and here's the first puzzle:
setup: A completely fresh quincy cluster, 64 core EPYC 7713, 2 nvme drives
> ceph osd crush rule create-replicated osd default osd ssd
> ceph osd pool create rbd replicated osd --size 2
> dd if=/dev/rbd0 of=/tmp/testfile status=progress bs=4M count=1000
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 7.0152 s, 598 MB/s
> dd of=/dev/rbd0 if=/tmp/testfile status=progress bs=4M count=1000
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 3.82156 s, 1.1 GB/s
write performance is 1/3 of raw nvme, which i suppose is expected (not
very good tho)
but why is read performance so bad?
top shows only one core is being utilized at 40% cpu.
it can't be network either, since this is all localhost.
thanks
Arvid
--
+4916093821054
Hi all,
osd_heartbeat_grace = 20 and osd_pool_default_read_lease_ratio = 0.8 by
default, so, pg will wait 16s when osd restart in the worst case. This wait
time is too long, client i/o can not be unacceptable. I think adjusting
the osd_pool_default_read_lease_ratio to lower is a good way. Have any good
suggestions about reduce pg wait time?
Best Regard
Yite Gu
Hi,
I ended up with having whole set of osds to get back original ceph cluster.
I figured out to make the cluster running. However, it's status is
something as below:
bash-4.4$ ceph -s
cluster:
id: 3f271841-6188-47c1-b3fd-90fd4f978c76
health: HEALTH_WARN
7 daemons have recently crashed
4 slow ops, oldest one blocked for 35077 sec, daemons
[mon.a,mon.b] have slow ops.
services:
mon: 3 daemons, quorum a,b,d (age 9h)
mgr: b(active, since 14h), standbys: a
osd: 4 osds: 0 up, 4 in (since 9h)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
All osds are down.
I checked the osds logs and attached with this.
Please help and I wonder if it's possible to get the cluster back. I have
some backup for monitor's data. Till now I haven't restore that in the
course.
Thanks,
Ben
I have a large number of misplaced objects, and I have all osd settings to “1” already:
sudo ceph tell osd.\* injectargs '--osd_max_backfills=1 --osd_recovery_max_active=1 --osd_recovery_op_priority=1'
How can I slow it down even more? The cluster is too large, it’s impacting other network traffic 😉
Hi,
we've observed 500er errors on uploading files to a single bucket, but the
problem went away after around 2 hours.
We've checked and saw the following error message:
2023-03-08T17:55:58.778+0000 7f8062f15700 0 WARNING: set_req_state_err
err_no=125 resorting to 500 2023-03-08T17:55:58.778+0000 7f8062f15700 0
ERROR: RESTFUL_IO(s)->complete_header() returned err=Bad file
descriptor 2023-03-08T17:55:58.778+0000
7f8062f15700 1 ====== req done req=0x7f81d0189700 op status=-125
http_status=500 latency=65003730017ns ====== 2023-03-08T17:55:58.778+0000
7f8062f15700 1 beast: 0x7f81d0189700: IPADDRESS - -
[2023-03-08T17:55:58.778961+0000] "PUT /BUCKET/OBJECT HTTP/1.1" 500 57 -
"aws-sdk-php/3.257.11 OS/Linux/5.15.0-60-generic lang/php/8.2.3
GuzzleHttp/7" -
It only happened to a single bucket over a period of 1-2 hours (around 300
requests).
In the same time we've had >20k PUT requests the were working fine on other
buckets.
This error also seem to happen to other buckets, but only very sporadically.
Did someone encounter this issue or knows what it could be?
Cheers
Boris
Hi,
I am trying to deploy Ceph Quincy using ceph-ansible on Rocky9. I am having
some problems and I don't know where to search for the reason.
PS : I did the same deployment on Rocky8 using ceph-ansible for the Pacific
version on the same hardware and it worked perfectly.
I have 03 controllers nodes : mon, mgr, mdss and rgws
and 27 osd nodes : with 04 nvme disks (osd) each
I am using a 10Gb network with jumbo frames.
The deployment starts with no issues, the 03 monitors are created
correctly, then the 03 managers are created, after that the OSD are
prepared and formatted, until here everything is working fine, but when the
"*wait for all osd to be up*" task is launched, which means starting all
OSDs containers in all OSD nodes, things go south, the monitors become out
of quorum, ceph -s takes a lot of time to respond and not all OSDs are
being activated, and the deployment fails at the end.
cluster 2023-03-06T12:00:26.431947+0100 mon.controllera (mon.0) 3864 :
cluster [WRN] [WRN] MON_DOWN: 1/3 mons down, quorum controllera,controllerc
cluster 2023-03-06T12:00:26.431953+0100 mon.controllera (mon.0) 3865 :
cluster [WRN] mon.controllerb (rank 1) addr [v2:
20.1.0.27:3300/0,v1:20.1.0.27:6789/0] is down (out of quorum)
The monitor container in 2 of my controllers nodes stays at 100% of cpu
utilization.
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
MEM % NET I/O BLOCK I/O PIDS
068e4e55f299 ceph-mon-controllera 99.91% 58.12MiB / 376.1GiB
0.02% 0B / 0B 122MB / 85.3MB 28 <-----------------
87730f89420d ceph-mgr-controllera 0.32% 408.2MiB / 376.1GiB
0.11% 0B / 0B 181MB / 0B 35
Could that be a resource problem? the monitor containers do not have enough
resources CPU, RAM, ...etc to handle all the OSDs that are being started?
If yes, how may I find this?
thanks in advance.
Regards.