July 2020 - ceph-users - lists.ceph.io

S3 bucket lifecycle not deleting old objects

by Alex Hussein-Kershaw

Hello, I have a problem that old versions of S3 objects are not being deleted. Can anyone advise as to why? I'm using Ceph 14.2.9. I expect old versions of S3 objects to be deleted after 3 days as per my lifecycle config on the bucket: { "Rules": [ { "Status": "Enabled", "Prefix": "", "NoncurrentVersionExpiration": { "NoncurrentDays": 3 }, "Expiration": { "ExpiredObjectDeleteMarker": true }, "ID": "S3 scsdata bucket: Tidy up old versions" } ] } I have an object with 3 versions below (and is much older than 3 days): [root@hera hera_sdc] /usr/bin> aws s3api --endpoint http://127.3.3.3:7480 list-object-versions --bucket hera-scsdata --key 84/46/2020060508501821902143658709-Subscriber | grep -B 4 -A 6 84/46/2020060508501821902143658709-Subscriber "LastModified": "2020-06-05T08:58:19.644Z", "VersionId": "FUdZIehBu3sgRbNJSmZwj3VHWs1ednH", "ETag": "\"a18286c50a7323efe58497eb97d6dc9d\"", "StorageClass": "STANDARD", "Key": "84/46/2020060508501821902143658709-Subscriber", "Owner": { "DisplayName": "hera EAS S3 user", "ID": "hera" }, "IsLatest": true, "Size": 4440 -- "LastModified": "2020-06-05T08:58:17.943Z", "VersionId": "JVKGMJQS-l7xKQuqdfn4QsEY5WLEosj", "ETag": "\"87e9953af436b702afb80d457f1d73cb\"", "StorageClass": "STANDARD", "Key": "84/46/2020060508501821902143658709-Subscriber", "Owner": { "DisplayName": "hera EAS S3 user", "ID": "hera" }, "IsLatest": false, "Size": 4408 -- "LastModified": "2020-06-05T08:50:19.167Z", "VersionId": "-RSNSCDvGj83f4DZ11s8YZ2KaxT8T.a", "ETag": "\"a68ec68ce825e009ee9a70cfdae9c794\"", "StorageClass": "STANDARD", "Key": "84/46/2020060508501821902143658709-Subscriber", "Owner": { "DisplayName": "hera EAS S3 user", "ID": "hera" }, "IsLatest": false, "Size": 4256 -- ], "NextKeyMarker": "85/49/20200604163626B4C712312312302641-Subscriber", "MaxKeys": 1000, "Prefix": "", "KeyMarker": "84/46/2020060508501821902143658709-Subscriber", "DeleteMarkers": [ { "Owner": { "DisplayName": "hera EAS S3 user", "ID": "hera" }, So those objects still being present seems to be in conflict with the config I have set? Thanks, Alex

3 years, 9 months

1
0
0 0

Re: Push config to all hosts

by Cem Zafer

Thanks Ricardo for clarification. Regards. On Mon, Jul 27, 2020 at 2:50 PM Ricardo Marques <RiMarques(a)suse.com> wrote: > Hi Cem, > > Since https://github.com/ceph/ceph/pull/35576 you will be able to tell > cephadm to keep your `/etc/ceph/ceph.conf` updated in all hosts by runnig: > > # ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true > > But this feature was not released yet, so you will have to wait for > v15.2.5. > > > Ricardo Marques > > ------------------------------ > *From:* Cem Zafer <cemzafer(a)gmail.com> > *Sent:* Monday, June 29, 2020 6:37 AM > *To:* ceph-users(a)ceph.io <ceph-users(a)ceph.io> > *Subject:* [ceph-users] Push config to all hosts > > Hi, > What is the best method(s) to push ceph.conf to all hosts in octopus > (15.x)? > Thanks. > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > >

3 years, 9 months

1
0
0 0

Weird buckets in a new cluster causing broken dashboard functionality

by Eugen König

Hi all, I'm currently experience some strange behavior in our cluster: the dashboards object gateway "buckets" submenu is broken and I'm getting 503 errors (however, "Users" and "Daemons" work flawlessly). Looking into the mgr log gives me following error: 2020-07-24T12:38:12.695+0200 7f42150f3700 0 [dashboard ERROR rest_client] RGW REST API failed GET req status: 404 2020-07-24T12:38:12.843+0200 7f42130ef700 0 [dashboard ERROR request] [10.1.0.133:38454] [GET] [500] [0.039s] [eugen] [513.0B] /api/rgw/bucket/91e22800581543b5be4654f7b9b0c7cc_102020-07-24T12:38:12.843+0200 7f42130ef700 0 [dashboard ERROR request] [b'{"status": "500 Internal Server Error", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "request_id": "00a3c78a-d96f-4f8b-b3c4-f24eac99f4a1"} '] So it took me to the point to look into buckets and I got this list with weird bucket names (even with dash, but it's actually not allowed): root@ceph1-40-10~# radosgw-admin buckets list [ "12345", "deployment", "91e22800581543b5be4654f7b9b0c7cc_5", "91e22800581543b5be4654f7b9b0c7cc_6", "91e22800581543b5be4654f7b9b0c7cc_11", "91e22800581543b5be4654f7b9b0c7cc_8", "91e22800581543b5be4654f7b9b0c7cc_3", "91e22800581543b5be4654f7b9b0c7cc_17", [...] ] If I try to perform some operations on these buckets, I get an error: root@ceph1-40-10~# radosgw-admin bucket rm --bucket=91e22800581543b5be4654f7b9b0c7cc_15 --purge-objects 2020-07-28T12:40:05.392+0200 7f500024a080 -1 ERROR: unable to remove bucket(2) No such file or directory I'm able to change the owner and even rename it, but anything else is ending in the same error above. This also seems to break the buckets functionality in the dashboard. My question is: what are those buckets, where do they come from and what is stored in there? And why one is not able to perform any ops on them? Any ideas? Best, eugen

3 years, 9 months

1
0
0 0

Re: please help me fix iSCSI Targets not available

by David Thuong

i user ceph octopus v15

3 years, 9 months

1
0
0 0

HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error

by Fabrizio Cuseo

Hello, I use ceph with proxmox, release 14.2.9, with bluestore OSD. I had a problem in a replica 2 pool (i know, is dangerous), with an unexpected clone. I have deleted the RBD image that used the damaged object (with his snapshot), and now ceph can't trim and clean. How can I restore the clean and health status ? Regards, Fabrizio

3 years, 9 months

2
1
0 0

Re: Cluster became unresponsive: e5 handle_auth_request failed to assign global_id

by Илья Борисович Волошин

No, they are stored locally on ESXi data storage on top of hardware RAID5 built with SAS/SATA (different hardware on hosts). Also, I've tried going back to the snapshot taken just after all monitors and OSDs were added to cluster. The host boots fine and is working as it should, however, after the next reboot this problem appears (no changes to configuration were made). And another thing - even if docker container for mgr is running and gives no errors in logs neither inside the container nor on the parent host the mgr doesn't bind to any ports it should: 6800, 6801 and 8443 for dashboard. Not sure if it is the reason or the consequence of this problem. вт, 28 июл. 2020 г. в 11:37, Anthony D'Atri <anthony.datri(a)gmail.com>: > Are your mon DBs on SSDs? > > > On Jul 27, 2020, at 7:28 AM, Илья Борисович Волошин < > i.voloshin(a)simplesolution.pro> wrote: > > > > Here are all the active ports on mon1 (with the exception of sshd and > ntpd): > > > > # netstat -npl > > Proto Recv-Q Send-Q Local Address Foreign Address State > > PID/Program name > > tcp 0 0 <mon1_ip>:3300 0.0.0.0:* LISTEN > > 1582/ceph-mon > > tcp 0 0 <mon1_ip>:6789 0.0.0.0:* > LISTEN > > 1582/ceph-mon > > tcp6 0 0 :::9093 :::* > LISTEN > > 908/alertmanager > > tcp6 0 0 :::9094 :::* > LISTEN > > 908/alertmanager > > tcp6 0 0 :::9095 :::* > LISTEN > > 896/prometheus > > tcp6 0 0 :::9100 :::* > LISTEN > > 906/node_exporter > > tcp6 0 0 :::3000 :::* > LISTEN > > 882/grafana-server > > udp6 0 0 :::9094 :::* > > 908/alertmanager > > > > I've tried telnet from mon1 host, can connect to 3300 and 6789: > > > > # telnet <mon1_ip> 3300 > > > Trying <mon1_ip>... > > Connected to <mon1_ip>. > > Escape character is '^]'. > > ceph v2 > > > > # telnet <mon1_ip> 6789 > > Trying <mon1_ip>... > > Connected to <mon1_ip>. > > Escape character is '^]'. > > ceph v027QQ > > > > 6800 and 6801 refuse connection: > > > > # telnet <mon1_ip> 6800 > > Trying <mon1_ip>... > > telnet: Unable to connect to remote host: Connection refused > > > > I don't see any errors in the log related to failures to bind... and all > > CEPH systemd services are running as far as I can tell: > > > > # systemctl list-units -a | grep ceph > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5(a)alertmanager.mon1.service > > loaded active running Ceph > > alertmanager.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5(a)crash.mon1.service > > loaded active running Ceph crash.mon1 > > for e30397f0-cc32-11ea-8c8e-000c29469cd5 > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5(a)grafana.mon1.service > > loaded active running Ceph > grafana.mon1 > > for e30397f0-cc32-11ea-8c8e-000c29469cd5 > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5(a)mgr.mon1.peevkl.service > > loaded active running Ceph > > mgr.mon1.peevkl for e30397f0-cc32-11ea-8c8e-000c29469cd5 > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5(a)mon.mon1.service > > loaded active running Ceph mon.mon1 > for > > e30397f0-cc32-11ea-8c8e-000c29469cd5 > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5(a)node-exporter.mon1.service > > loaded active running Ceph > > node-exporter.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5(a)prometheus.mon1.service > > loaded active running Ceph > > prometheus.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 > > system-ceph\x2de30397f0\x2dcc32\x2d11ea\x2d8c8e\x2d000c29469cd5.slice > > loaded active active > > system-ceph\x2de30397f0\x2dcc32\x2d11ea\x2d8c8e\x2d000c29469cd5.slice > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5.target > > loaded active active Ceph cluster > > e30397f0-cc32-11ea-8c8e-000c29469cd5 > > ceph.target > > loaded active active All Ceph clusters > > and services > > > > Here are currently active docker images: > > > > # docker ps > > CONTAINER ID IMAGE COMMAND > > CREATED STATUS PORTS NAMES > > dfd8dbeccf1e ceph/ceph:v15 "/usr/bin/ceph-mgr -…" > > 41 minutes ago Up 41 minutes > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-mgr.mon1.peevkl > > 9452d1db7ffb ceph/ceph:v15 "/usr/bin/ceph-mon -…" > 3 > > hours ago Up 3 hours > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-mon.mon1 > > 703ec4a43824 prom/prometheus:v2.18.1 "/bin/prometheus --c…" > 3 > > hours ago Up 3 hours > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-prometheus.mon1 > > d816ec5e645f ceph/ceph:v15 "/usr/bin/ceph-crash…" > 3 > > hours ago Up 3 hours > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-crash.mon1 > > 38d283ba6424 ceph/ceph-grafana:latest "/bin/sh -c 'grafana…" > 3 > > hours ago Up 3 hours > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-grafana.mon1 > > cc119ec8f09a prom/node-exporter:v0.18.1 "/bin/node_exporter …" > 3 > > hours ago Up 3 hours > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-node-exporter.mon1 > > aa1d339c4100 prom/alertmanager:v0.20.0 "/bin/alertmanager -…" > 3 > > hours ago Up 3 hours > > ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-alertmanager.mon1 > > > > iptables are active, I tried setting all chain policies to ACCEPT (didn't > > help), the rules are as such: > > > > 0 0 CEPH tcp -- * * 0.0.0.0/0 > > 0.0.0.0/0 tcp dpt:6789 > > 5060 303K CEPH tcp -- * * 0.0.0.0/0 > > 0.0.0.0/0 multiport dports 6800:7300 > > > > Chain CEPH includes addresses for monitors and OSDs. > > > > пн, 27 июл. 2020 г. в 17:07, Dino Godor <dg(a)terralink.de>: > > > >> Hi, > >> > >> have you tried to locally connect to the ports with netcat (or telnet)? > >> > >> Is the process listening ? (something like netstat -4ln or the current > >> equivalent thereof) > >> > >> Is the old (new) Firewall maybe still running ? > >> > >> > >> On 27.07.20 16:00, Илья Борисович Волошин wrote: > >>> Hello, > >>> > >>> I've created an Octopus 15.2.4 cluster with 3 monitors and 3 OSDs (6 > >> hosts > >>> in total, all ESXi VMs). It lived through a couple of reboots without > >>> problem, then I've reconfigured the main host a bit: > >>> set iptables-legacy as current option in update-alternatives (this is a > >>> Debian10 system), applied a basic ruleset of iptables and restarted > >> docker. > >>> > >>> After that the cluster became unresponsive (any ceph command hangs > >>> indefinitely). I can use admin socket to manipulate config though. > >> Setting > >>> debug_ms to 5 I see this in the logs (timestamps cut for readability): > >>> > >>> 7f4096f41700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > >>> [v2:<mon2_ip>:3300/0,v1:<mon2_ip>:6789/0] conn(0x55c21b975800 > >>> 0x55c21ab45180 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rx=0 tx= > >>> 0).send_message enqueueing message m=0x55c21bd84a00 type=67 > >> mon_probe(probe > >>> e30397f0-cc32-11ea-8c8e-000c29469cd5 name mon1 mon_release octopus) v7 > >>> 7f4098744700 1 -- >> > >>> [v2:<mon1_ip>:6800/561959008,v1:<mon1_ip>:6801/561959008] > >>> conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 > >> s=STATE_CONNECTING_RE > >>> l=0).process reconnect failed to v2:81.200.2 > >>> .152:6800/561959008 > >>> 7f4098744700 2 -- >> > >>> [v2:<mon1_ip>:6800/561959008,v1:<mon1_ip>:6801/561959008] > >>> conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 > >> s=STATE_CONNECTING_RE > >>> l=0).process connection refused! > >>> > >>> and this: > >>> > >>> 7f4098744700 2 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > >>> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 > >> cs=0 > >>> l=1 rx=0 tx=0)._fault on lossy channel, failing > >>> 7f4098744700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > >>> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 > >> cs=0 > >>> l=1 rx=0 tx=0).stop > >>> 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > >>> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 > >> cs=0 > >>> l=1 rx=0 tx=0).reset_recv_state > >>> 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > >>> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 > >> cs=0 > >>> l=1 rx=0 tx=0).reset_security > >>> 7f409373a700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > >>> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=NONE pgs=0 cs=0 l=0 > >> rx=0 > >>> tx=0).accept > >>> 7f4098744700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > >>> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=BANNER_ACCEPTING > pgs=0 > >>> cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0 > >>> 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > >>> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0 > >>> cs=0 l=0 rx=0 tx=0).handle_hello received hello: peer_type=8 > >>> peer_addr_for_me=v2:<mon1_ip>:3300/0 > >>> 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > >>> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0 > >>> cs=0 l=0 rx=0 tx=0).handle_hello getsockname says I am <mon1_ip>:3300 > >> when > >>> talking to v2:<mon1_ip>:49012/0 > >>> 7f4098744700 1 mon.mon1@0(probing) e5 handle_auth_request failed to > >> assign > >>> global_id > >>> > >>> Config (the result of ceph --admin-daemon > >>> /run/ceph/e30397f0-cc32-11ea-8c8e-000c29469cd5/ceph-mon.mon1.asok > config > >>> show): > >>> https://pastebin.com/kifMXs9H > >>> > >>> I can connect to ports 3300 and 6789 with telnet; 6800 and 6801 return > >>> 'process connection refused' > >>> > >>> Setting all iptables policies to ACCEPT didn't change anything. > >>> > >>> Where should I start digging to fix this problem? I'd like to at least > >>> understand why this happened before putting the cluster into > production. > >>> Any help is appreciated. > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users(a)ceph.io > >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > >

3 years, 9 months

1
0
0 0

Server error when trying to view this list in browser

by biohazd＠yahoo.com

Hi I often get "Server error - An error occurred while processing your request." when tryignt o view this lis in Fireofx, is it a known issues? https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/new

3 years, 9 months

1
0
0 0

6 hosts fail cephadm check (15.2.4)

by Ml Ml

Hello, i get: [WRN] CEPHADM_HOST_CHECK_FAILED: 6 hosts fail cephadm check host ceph01 failed check: Failed to connect to ceph01 (ceph01). Check that the host is reachable and accepts connections using the cephadm SSH key you may want to run: > ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph01 host ceph02 failed check: Failed to connect to ceph02 (10.10.1.2). Check that the host is reachable and accepts connections using the cephadm SSH key you may want to run: > ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph02 host ceph03 failed check: Failed to connect to ceph03 (10.10.1.3). Check that the host is reachable and accepts connections using the cephadm SSH key you may want to run: > ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph03 host ceph04 failed check: Failed to connect to ceph04 (10.10.1.4). Check that the host is reachable and accepts connections using the cephadm SSH key you may want to run: > ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph04 host ceph05 failed check: Failed to connect to ceph05 (10.10.1.5). Check that the host is reachable and accepts connections using the cephadm SSH key you may want to run: > ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph05 host ceph06 failed check: Failed to connect to ceph06 (10.10.1.6). Check that the host is reachable and accepts connections using the cephadm SSH key on ceph01 i run: ceph cephadm get-ssh-config > /tmp/ceph.conf ceph config-key get mgr/cephadm/ssh_identity_key > /tmp/ceph.key chmod 600 /tmp/ceph.key ssh -F /tmp/ceph.conf -i /tmp/ceph.key root@ceph01 (which works) So i can not understand the errors above. root@ceph01:~# ceph versions { "mon": { "ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 3 }, "mgr": { "ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 3 }, "osd": { "ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 56 }, "mds": { "ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 1 }, "overall": { "ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 63 } } root@ceph01:~# dpkg -l |grep ceph ii ceph-base 15.2.4-1~bpo10+1 amd64 common ceph daemon libraries and management tools ii ceph-common 15.2.4-1~bpo10+1 amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-deploy 2.0.1 all Ceph-deploy is an easy to use configuration tool ii ceph-fuse 15.2.4-1~bpo10+1 amd64 FUSE-based client for the Ceph distributed file system ii ceph-grafana-dashboards 15.2.4-1~bpo10+1 all grafana dashboards for the ceph dashboard ii ceph-mds 15.2.4-1~bpo10+1 amd64 metadata server for the ceph distributed file system ii ceph-mgr 15.2.4-1~bpo10+1 amd64 manager for the ceph distributed storage system ii ceph-mgr-cephadm 15.2.4-1~bpo10+1 all cephadm orchestrator module for ceph-mgr ii ceph-mgr-dashboard 15.2.4-1~bpo10+1 all dashboard module for ceph-mgr ii ceph-mgr-diskprediction-cloud 15.2.4-1~bpo10+1 all diskprediction-cloud module for ceph-mgr ii ceph-mgr-diskprediction-local 15.2.4-1~bpo10+1 all diskprediction-local module for ceph-mgr ii ceph-mgr-k8sevents 15.2.4-1~bpo10+1 all kubernetes events module for ceph-mgr ii ceph-mgr-modules-core 15.2.4-1~bpo10+1 all ceph manager modules which are always enabled ii ceph-mon 15.2.4-1~bpo10+1 amd64 monitor server for the ceph storage system ii ceph-osd 15.2.4-1~bpo10+1 amd64 OSD server for the ceph storage system ii cephadm 15.2.4-1~bpo10+1 amd64 cephadm utility to bootstrap ceph daemons with systemd and containers ii libcephfs1 10.2.11-2 amd64 Ceph distributed file system client library ii libcephfs2 15.2.4-1~bpo10+1 amd64 Ceph distributed file system client library ii python-ceph-argparse 14.2.8-1 all Python 2 utility libraries for Ceph CLI ii python3-ceph-argparse 15.2.4-1~bpo10+1 all Python 3 utility libraries for Ceph CLI ii python3-ceph-common 15.2.4-1~bpo10+1 all Python 3 utility libraries for Ceph ii python3-cephfs 15.2.4-1~bpo10+1 amd64 Python 3 libraries for the Ceph libcephfs library root@ceph01:~# ceph -s cluster: id: 5436dd5d-83d4-4dc8-a93b-60ab5db145df health: HEALTH_WARN 6 hosts fail cephadm check failed to probe daemons or devices 7 nearfull osd(s) Reduced data availability: 1 pg inactive Low space hindering backfill (add storage if this doesn't resolve itself): 26 pgs backfill_toofull Degraded data redundancy: 202495/33226941 objects degraded (0.609%), 26 pgs degraded, 26 pgs undersized 3 pool(s) nearfull services: mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 39m) mgr: ceph02(active, since 77m), standbys: ceph03, ceph01 mds: 2 up:standby osd: 61 osds: 56 up (since 41m), 55 in (since 41m); 27 remapped pgs data: pools: 3 pools, 2049 pgs objects: 11.08M objects, 37 TiB usage: 113 TiB used, 28 TiB / 141 TiB avail pgs: 0.049% pgs not active 202495/33226941 objects degraded (0.609%) 9238/33226941 objects misplaced (0.028%) 1025 active+clean 887 active+clean+snaptrim_wait 110 active+clean+snaptrim 25 active+undersized+degraded+remapped+backfill_toofull 1 undersized+degraded+remapped+backfill_toofull+peered 1 active+remapped+backfilling io: client: 1.0 KiB/s rd, 140 KiB/s wr, 0 op/s rd, 1 op/s wr recovery: 30 MiB/s, 8 objects/s I already restarted the mgr on ceph02. Thanks, Michael

3 years, 9 months

2
1
0 0

Ceph pool at 90% capacity - rbd rm is timing out - any way to rescue?

by Victor Hooi

Hi, We've let our Ceph pool (Octopus) get into a bad state, with around 90% full: # ceph health > HEALTH_ERR 1/4 mons down, quorum > angussyd-kvm01,angussyd-kvm02,angussyd-kvm03; 3 backfillfull osd(s); 1 full > osd(s); 14 nearfull osd(s); Low space hindering backfill (add storage if > this doesn't resolve itself): 580 pgs backfill_toofull; Degraded data > redundancy: 1860769/9916650 objects degraded (18.764%), 597 pgs degraded, > 580 pgs undersized; 323 pgs not deep-scrubbed in time; 189 pgs not scrubbed > in time; Full OSDs blocking recovery: 17 pgs recovery_toofull; 4 pool(s) > full; 1 pools have too many placement groups At this point, even trying to run 'rbd rm" or "rbd du" seems to time out. (I am however, able to run "rbd ls -l" which shows me rbd image size - I assume that's before taking into account thin-provisioning). Is there any way to rescue this pool? Or at least some way to force delete some of the large images? Regards, Victor

3 years, 9 months

1
0
0 0

Cluster became unresponsive: e5 handle_auth_request failed to assign global_id

by Илья Борисович Волошин

Hello, I've created an Octopus 15.2.4 cluster with 3 monitors and 3 OSDs (6 hosts in total, all ESXi VMs). It lived through a couple of reboots without problem, then I've reconfigured the main host a bit: set iptables-legacy as current option in update-alternatives (this is a Debian10 system), applied a basic ruleset of iptables and restarted docker. After that the cluster became unresponsive (any ceph command hangs indefinitely). I can use admin socket to manipulate config though. Setting debug_ms to 5 I see this in the logs (timestamps cut for readability): 7f4096f41700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> [v2:<mon2_ip>:3300/0,v1:<mon2_ip>:6789/0] conn(0x55c21b975800 0x55c21ab45180 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rx=0 tx= 0).send_message enqueueing message m=0x55c21bd84a00 type=67 mon_probe(probe e30397f0-cc32-11ea-8c8e-000c29469cd5 name mon1 mon_release octopus) v7 7f4098744700 1 -- >> [v2:<mon1_ip>:6800/561959008,v1:<mon1_ip>:6801/561959008] conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 s=STATE_CONNECTING_RE l=0).process reconnect failed to v2:81.200.2 .152:6800/561959008 7f4098744700 2 -- >> [v2:<mon1_ip>:6800/561959008,v1:<mon1_ip>:6801/561959008] conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 s=STATE_CONNECTING_RE l=0).process connection refused! and this: 7f4098744700 2 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0)._fault on lossy channel, failing 7f4098744700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0).stop 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0).reset_recv_state 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0).reset_security 7f409373a700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).accept 7f4098744700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0).handle_hello received hello: peer_type=8 peer_addr_for_me=v2:<mon1_ip>:3300/0 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0).handle_hello getsockname says I am <mon1_ip>:3300 when talking to v2:<mon1_ip>:49012/0 7f4098744700 1 mon.mon1@0(probing) e5 handle_auth_request failed to assign global_id Config (the result of ceph --admin-daemon /run/ceph/e30397f0-cc32-11ea-8c8e-000c29469cd5/ceph-mon.mon1.asok config show): https://pastebin.com/kifMXs9H I can connect to ports 3300 and 6789 with telnet; 6800 and 6801 return 'process connection refused' Setting all iptables policies to ACCEPT didn't change anything. Where should I start digging to fix this problem? I'd like to at least understand why this happened before putting the cluster into production. Any help is appreciated.

3 years, 9 months

2
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users July 2020