Hello,
I have a problem that old versions of S3 objects are not being deleted. Can anyone advise as to why? I'm using Ceph 14.2.9.
I expect old versions of S3 objects to be deleted after 3 days as per my lifecycle config on the bucket:
{
"Rules": [
{
"Status": "Enabled",
"Prefix": "",
"NoncurrentVersionExpiration": {
"NoncurrentDays": 3
},
"Expiration": {
"ExpiredObjectDeleteMarker": true
},
"ID": "S3 scsdata bucket: Tidy up old versions"
}
]
}
I have an object with 3 versions below (and is much older than 3 days):
[root@hera hera_sdc] /usr/bin> aws s3api --endpoint http://127.3.3.3:7480 list-object-versions --bucket hera-scsdata --key 84/46/2020060508501821902143658709-Subscriber | grep -B 4 -A 6 84/46/2020060508501821902143658709-Subscriber
"LastModified": "2020-06-05T08:58:19.644Z",
"VersionId": "FUdZIehBu3sgRbNJSmZwj3VHWs1ednH",
"ETag": "\"a18286c50a7323efe58497eb97d6dc9d\"",
"StorageClass": "STANDARD",
"Key": "84/46/2020060508501821902143658709-Subscriber",
"Owner": {
"DisplayName": "hera EAS S3 user",
"ID": "hera"
},
"IsLatest": true,
"Size": 4440
--
"LastModified": "2020-06-05T08:58:17.943Z",
"VersionId": "JVKGMJQS-l7xKQuqdfn4QsEY5WLEosj",
"ETag": "\"87e9953af436b702afb80d457f1d73cb\"",
"StorageClass": "STANDARD",
"Key": "84/46/2020060508501821902143658709-Subscriber",
"Owner": {
"DisplayName": "hera EAS S3 user",
"ID": "hera"
},
"IsLatest": false,
"Size": 4408
--
"LastModified": "2020-06-05T08:50:19.167Z",
"VersionId": "-RSNSCDvGj83f4DZ11s8YZ2KaxT8T.a",
"ETag": "\"a68ec68ce825e009ee9a70cfdae9c794\"",
"StorageClass": "STANDARD",
"Key": "84/46/2020060508501821902143658709-Subscriber",
"Owner": {
"DisplayName": "hera EAS S3 user",
"ID": "hera"
},
"IsLatest": false,
"Size": 4256
--
],
"NextKeyMarker": "85/49/20200604163626B4C712312312302641-Subscriber",
"MaxKeys": 1000,
"Prefix": "",
"KeyMarker": "84/46/2020060508501821902143658709-Subscriber",
"DeleteMarkers": [
{
"Owner": {
"DisplayName": "hera EAS S3 user",
"ID": "hera"
},
So those objects still being present seems to be in conflict with the config I have set?
Thanks,
Alex
Thanks Ricardo for clarification.
Regards.
On Mon, Jul 27, 2020 at 2:50 PM Ricardo Marques <RiMarques(a)suse.com> wrote:
> Hi Cem,
>
> Since https://github.com/ceph/ceph/pull/35576 you will be able to tell
> cephadm to keep your `/etc/ceph/ceph.conf` updated in all hosts by runnig:
>
> # ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true
>
> But this feature was not released yet, so you will have to wait for
> v15.2.5.
>
>
> Ricardo Marques
>
> ------------------------------
> *From:* Cem Zafer <cemzafer(a)gmail.com>
> *Sent:* Monday, June 29, 2020 6:37 AM
> *To:* ceph-users(a)ceph.io <ceph-users(a)ceph.io>
> *Subject:* [ceph-users] Push config to all hosts
>
> Hi,
> What is the best method(s) to push ceph.conf to all hosts in octopus
> (15.x)?
> Thanks.
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
Hi all,
I'm currently experience some strange behavior in our cluster: the dashboards object gateway "buckets" submenu is broken and I'm getting 503 errors (however, "Users" and "Daemons" work flawlessly). Looking into the mgr log gives me following error:
2020-07-24T12:38:12.695+0200 7f42150f3700 0 [dashboard ERROR rest_client] RGW REST API failed GET req status: 404
2020-07-24T12:38:12.843+0200 7f42130ef700 0 [dashboard ERROR request] [10.1.0.133:38454] [GET] [500] [0.039s] [eugen] [513.0B] /api/rgw/bucket/91e22800581543b5be4654f7b9b0c7cc_102020-07-24T12:38:12.843+0200 7f42130ef700 0 [dashboard ERROR request] [b'{"status": "500 Internal Server Error", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "request_id": "00a3c78a-d96f-4f8b-b3c4-f24eac99f4a1"} ']
So it took me to the point to look into buckets and I got this list with weird bucket names (even with dash, but it's actually not allowed):
root@ceph1-40-10~# radosgw-admin buckets list
[
"12345",
"deployment",
"91e22800581543b5be4654f7b9b0c7cc_5", "91e22800581543b5be4654f7b9b0c7cc_6", "91e22800581543b5be4654f7b9b0c7cc_11", "91e22800581543b5be4654f7b9b0c7cc_8", "91e22800581543b5be4654f7b9b0c7cc_3", "91e22800581543b5be4654f7b9b0c7cc_17",
[...]
]
If I try to perform some operations on these buckets, I get an error:
root@ceph1-40-10~# radosgw-admin bucket rm --bucket=91e22800581543b5be4654f7b9b0c7cc_15 --purge-objects
2020-07-28T12:40:05.392+0200 7f500024a080 -1 ERROR: unable to remove bucket(2) No such file or directory
I'm able to change the owner and even rename it, but anything else is ending in the same error above. This also seems to break the buckets functionality in the dashboard.
My question is: what are those buckets, where do they come from and what is stored in there? And why one is not able to perform any ops on them?
Any ideas?
Best,
eugen
Hello, I use ceph with proxmox, release 14.2.9, with bluestore OSD.
I had a problem in a replica 2 pool (i know, is dangerous), with an unexpected clone.
I have deleted the RBD image that used the damaged object (with his snapshot), and now ceph can't trim and clean.
How can I restore the clean and health status ?
Regards, Fabrizio
Hello,
i get:
[WRN] CEPHADM_HOST_CHECK_FAILED: 6 hosts fail cephadm check
host ceph01 failed check: Failed to connect to ceph01 (ceph01).
Check that the host is reachable and accepts connections using the
cephadm SSH key
you may want to run:
> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph01
host ceph02 failed check: Failed to connect to ceph02 (10.10.1.2).
Check that the host is reachable and accepts connections using the
cephadm SSH key
you may want to run:
> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph02
host ceph03 failed check: Failed to connect to ceph03 (10.10.1.3).
Check that the host is reachable and accepts connections using the
cephadm SSH key
you may want to run:
> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph03
host ceph04 failed check: Failed to connect to ceph04 (10.10.1.4).
Check that the host is reachable and accepts connections using the
cephadm SSH key
you may want to run:
> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph04
host ceph05 failed check: Failed to connect to ceph05 (10.10.1.5).
Check that the host is reachable and accepts connections using the
cephadm SSH key
you may want to run:
> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph05
host ceph06 failed check: Failed to connect to ceph06 (10.10.1.6).
Check that the host is reachable and accepts connections using the
cephadm SSH key
on ceph01 i run:
ceph cephadm get-ssh-config > /tmp/ceph.conf
ceph config-key get mgr/cephadm/ssh_identity_key > /tmp/ceph.key
chmod 600 /tmp/ceph.key
ssh -F /tmp/ceph.conf -i /tmp/ceph.key root@ceph01 (which works)
So i can not understand the errors above.
root@ceph01:~# ceph versions
{
"mon": {
"ceph version 15.2.1
(9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 3
},
"mgr": {
"ceph version 15.2.1
(9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 3
},
"osd": {
"ceph version 15.2.1
(9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 56
},
"mds": {
"ceph version 15.2.1
(9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 1
},
"overall": {
"ceph version 15.2.1
(9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 63
}
}
root@ceph01:~# dpkg -l |grep ceph
ii ceph-base 15.2.4-1~bpo10+1
amd64 common ceph daemon libraries and management tools
ii ceph-common 15.2.4-1~bpo10+1
amd64 common utilities to mount and interact with a ceph
storage cluster
ii ceph-deploy 2.0.1
all Ceph-deploy is an easy to use configuration tool
ii ceph-fuse 15.2.4-1~bpo10+1
amd64 FUSE-based client for the Ceph distributed file system
ii ceph-grafana-dashboards 15.2.4-1~bpo10+1
all grafana dashboards for the ceph dashboard
ii ceph-mds 15.2.4-1~bpo10+1
amd64 metadata server for the ceph distributed file system
ii ceph-mgr 15.2.4-1~bpo10+1
amd64 manager for the ceph distributed storage system
ii ceph-mgr-cephadm 15.2.4-1~bpo10+1
all cephadm orchestrator module for ceph-mgr
ii ceph-mgr-dashboard 15.2.4-1~bpo10+1
all dashboard module for ceph-mgr
ii ceph-mgr-diskprediction-cloud 15.2.4-1~bpo10+1
all diskprediction-cloud module for ceph-mgr
ii ceph-mgr-diskprediction-local 15.2.4-1~bpo10+1
all diskprediction-local module for ceph-mgr
ii ceph-mgr-k8sevents 15.2.4-1~bpo10+1
all kubernetes events module for ceph-mgr
ii ceph-mgr-modules-core 15.2.4-1~bpo10+1
all ceph manager modules which are always enabled
ii ceph-mon 15.2.4-1~bpo10+1
amd64 monitor server for the ceph storage system
ii ceph-osd 15.2.4-1~bpo10+1
amd64 OSD server for the ceph storage system
ii cephadm 15.2.4-1~bpo10+1
amd64 cephadm utility to bootstrap ceph daemons with systemd
and containers
ii libcephfs1 10.2.11-2
amd64 Ceph distributed file system client library
ii libcephfs2 15.2.4-1~bpo10+1
amd64 Ceph distributed file system client library
ii python-ceph-argparse 14.2.8-1
all Python 2 utility libraries for Ceph CLI
ii python3-ceph-argparse 15.2.4-1~bpo10+1
all Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 15.2.4-1~bpo10+1
all Python 3 utility libraries for Ceph
ii python3-cephfs 15.2.4-1~bpo10+1
amd64 Python 3 libraries for the Ceph libcephfs library
root@ceph01:~# ceph -s
cluster:
id: 5436dd5d-83d4-4dc8-a93b-60ab5db145df
health: HEALTH_WARN
6 hosts fail cephadm check
failed to probe daemons or devices
7 nearfull osd(s)
Reduced data availability: 1 pg inactive
Low space hindering backfill (add storage if this doesn't
resolve itself): 26 pgs backfill_toofull
Degraded data redundancy: 202495/33226941 objects degraded
(0.609%), 26 pgs degraded, 26 pgs undersized
3 pool(s) nearfull
services:
mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 39m)
mgr: ceph02(active, since 77m), standbys: ceph03, ceph01
mds: 2 up:standby
osd: 61 osds: 56 up (since 41m), 55 in (since 41m); 27 remapped pgs
data:
pools: 3 pools, 2049 pgs
objects: 11.08M objects, 37 TiB
usage: 113 TiB used, 28 TiB / 141 TiB avail
pgs: 0.049% pgs not active
202495/33226941 objects degraded (0.609%)
9238/33226941 objects misplaced (0.028%)
1025 active+clean
887 active+clean+snaptrim_wait
110 active+clean+snaptrim
25 active+undersized+degraded+remapped+backfill_toofull
1 undersized+degraded+remapped+backfill_toofull+peered
1 active+remapped+backfilling
io:
client: 1.0 KiB/s rd, 140 KiB/s wr, 0 op/s rd, 1 op/s wr
recovery: 30 MiB/s, 8 objects/s
I already restarted the mgr on ceph02.
Thanks,
Michael
Hi,
We've let our Ceph pool (Octopus) get into a bad state, with around 90%
full:
# ceph health
> HEALTH_ERR 1/4 mons down, quorum
> angussyd-kvm01,angussyd-kvm02,angussyd-kvm03; 3 backfillfull osd(s); 1 full
> osd(s); 14 nearfull osd(s); Low space hindering backfill (add storage if
> this doesn't resolve itself): 580 pgs backfill_toofull; Degraded data
> redundancy: 1860769/9916650 objects degraded (18.764%), 597 pgs degraded,
> 580 pgs undersized; 323 pgs not deep-scrubbed in time; 189 pgs not scrubbed
> in time; Full OSDs blocking recovery: 17 pgs recovery_toofull; 4 pool(s)
> full; 1 pools have too many placement groups
At this point, even trying to run 'rbd rm" or "rbd du" seems to time out.
(I am however, able to run "rbd ls -l" which shows me rbd image size - I
assume that's before taking into account thin-provisioning).
Is there any way to rescue this pool? Or at least some way to force delete
some of the large images?
Regards,
Victor
Hello,
I've created an Octopus 15.2.4 cluster with 3 monitors and 3 OSDs (6 hosts
in total, all ESXi VMs). It lived through a couple of reboots without
problem, then I've reconfigured the main host a bit:
set iptables-legacy as current option in update-alternatives (this is a
Debian10 system), applied a basic ruleset of iptables and restarted docker.
After that the cluster became unresponsive (any ceph command hangs
indefinitely). I can use admin socket to manipulate config though. Setting
debug_ms to 5 I see this in the logs (timestamps cut for readability):
7f4096f41700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >>
[v2:<mon2_ip>:3300/0,v1:<mon2_ip>:6789/0] conn(0x55c21b975800
0x55c21ab45180 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rx=0 tx=
0).send_message enqueueing message m=0x55c21bd84a00 type=67 mon_probe(probe
e30397f0-cc32-11ea-8c8e-000c29469cd5 name mon1 mon_release octopus) v7
7f4098744700 1 -- >>
[v2:<mon1_ip>:6800/561959008,v1:<mon1_ip>:6801/561959008]
conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 s=STATE_CONNECTING_RE
l=0).process reconnect failed to v2:81.200.2
.152:6800/561959008
7f4098744700 2 -- >>
[v2:<mon1_ip>:6800/561959008,v1:<mon1_ip>:6801/561959008]
conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 s=STATE_CONNECTING_RE
l=0).process connection refused!
and this:
7f4098744700 2 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >>
conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0
l=1 rx=0 tx=0)._fault on lossy channel, failing
7f4098744700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >>
conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0
l=1 rx=0 tx=0).stop
7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >>
conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0
l=1 rx=0 tx=0).reset_recv_state
7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >>
conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0
l=1 rx=0 tx=0).reset_security
7f409373a700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >>
conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0
tx=0).accept
7f4098744700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >>
conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=BANNER_ACCEPTING pgs=0
cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >>
conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0
cs=0 l=0 rx=0 tx=0).handle_hello received hello: peer_type=8
peer_addr_for_me=v2:<mon1_ip>:3300/0
7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >>
conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0
cs=0 l=0 rx=0 tx=0).handle_hello getsockname says I am <mon1_ip>:3300 when
talking to v2:<mon1_ip>:49012/0
7f4098744700 1 mon.mon1@0(probing) e5 handle_auth_request failed to assign
global_id
Config (the result of ceph --admin-daemon
/run/ceph/e30397f0-cc32-11ea-8c8e-000c29469cd5/ceph-mon.mon1.asok config
show):
https://pastebin.com/kifMXs9H
I can connect to ports 3300 and 6789 with telnet; 6800 and 6801 return
'process connection refused'
Setting all iptables policies to ACCEPT didn't change anything.
Where should I start digging to fix this problem? I'd like to at least
understand why this happened before putting the cluster into production.
Any help is appreciated.