September 2023 - ceph-users

insufficient space ( 10 extents) on vgs lvm detected locked

by absankar89＠gmail.com

ceph orch device ls - output ( insufficient space ( 10 extents) on vgs lvm detected locked ) Quincy version , Is this just warning or any action should be taken.

8 months

2
1
0 0

lack of RGW_API_HOST in ceph dashboard, 17.2.6, causes ceph mgr dashboard problems

by Christopher Durham

Hi, I am using 17.2.6 on Rocky Linux 8 The ceph mgr dashboard, in my situation, (bare metal install, upgraded from 15->16-> 17.2.6), can no longer hit the ObjectStore->(Daemons,Users,Buckets) pages. When I try to hit those pages, it gives an error: RGW REST API failed request with status code 403 {"Code": "AccessDenied", RequestId: "xxxxxxx", HostId: "yyyy-<my zone>"} The log of the rgw server it hit has: "GET /admin/metadata/user?myself HTTP/1.1" 403 125 It appears that the mgr dashboard setting RGW_API_HOST is no longer an option that can be set, nor does that name exist anywhere under /usr/share/ceph/mgr/dashboard, and: # ceph dashboard set-rgw-api-host <host> is no longer in existence in 17.2.6 However, since my situation is an upgrade, the config value still exists in my config, and I can retrieve it with: # ceph dashboard get-rgw-api-host To get the to work in my situation, I have modified /usr/share/ceph/mgr/dashboard/settings.py and re-added RGW_API_HOST to the Options class using RGW_API_HOST = Settings('', [dict,str]) I then modified /usr/share/ceph/mgr/dashboard/services/rgw_request.py such that each rgw daemon retrieved has its 'host' member set to Settings.RGW_API_HOST. Then after restarting the mgr, I was able to access the Objectstore->(Daemons,Users,Buckets) pages in the dashboard. HOWEVER, I know this is NOT the right way to fix this, it is a hack. It seems like the dashboard is trying to contact an rgw server individually. For us, the RGW_API_HOST is a name in DNS: s3.my.dom, that has multiple A records, one for each of our rgw servers, each of which have the *same* SSL cert with CN and SubjectAltNames that allow the cert to present itself as both s3.my.dom as well as the individual host name (SubjectAltName has ALL the rgw servers in it). This works well for us and has done so since 15.x.y, The endpoint for the zone is set to s3.my.dom. Thus my users only have a single endpoint to care about, unless there is a failure situation onan rgw server. (We have other ways of handling that). Any thoughts on the CORRECT way to handle this so I can have the ceph dashboard work with the ObjectStore->(Daemons,Users,Buckets) pages? Thanks. -Chris

8 months

2
2
0 0

RGW Lua - writable response header/field

by Ondřej Kukla

Hello, We have a RGW setup that has a bunch of Nginx in front of RGWs to work as a LB. I’m currently working on some metrics and log analysis from the LB logs. At the moment I’m looking at possibilities to recognise the type of s3 request on the LB. I know that matching the format shouldn’t be extremely hard, but I was looking into a possibility to extract the information from RGW as that’s the part that’s aware of that. I was working with the LUA part of RGW before so I know that the Request.RGWOp Field is an great fit. I would like to add this as a some kind of response header, but unfortunately that’s not possible at the moment if I’m not wrong. Has anyone looked into this (wink wink Yuval :))? Or do you have a recommendation how to do it? Thanks a lot. Regards, Ondrej

8 months

2
1
0 0

rgw replication sync issue

by ankit raikwar

Hello Users, We have the environment as below. Both environments are the zones of one RGW multisite zonegroup, whereas the DC zone is the primary and the DR zone is the secondary at this point. DC Ceph Version: 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) Number of rgw daemons : 25 DR Ceph Version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) Number of rgw daemons : 25 Environment description: Both the mentioned zones are in production and the RGW multisite bandwidth is over MPLS of around 3 Gbps. Issue description : We have enabled the multisite between DC-DR almost around a month ago. The total data at the DC zone is around 159 TiB and the sync has been going as expected . But when the sync had gone around 120 TiB we saw the speed drastically fell low, the ideal was around 2 Gbps, and it fell way below 10 Mbps though the link is not saturated. After checking "# radosgw-admin sync status " the output says "metadata is caught up with master" and "data is caught up with source" but with almost 25 TB data behind as compared to DC. It also looks like the sync status of the bucket " radosgw-admin bucket sync status --bucket=<bucket-name>" still bucket is behind shards. Attaching the log and the output below. The possibility of issuing a resync of the data from the beginning is quite low and not feasible in our case. The "# radosgw-admin sync error list" output is also attached with some information redacted and we see errors. radosgw-sync status radosgw-admin sync status realm 6a7fab77-64e3-453e-b54b-066bc8af2f00 (realm0) zonegroup be660604-d853-4f8e-a576-579cae2e07c2 (zg0) zone d06a8dd3-5bcb-486c-945b-2a98969ccd5f (fbd) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: d09d3d16-8601-448b-bf3d-609b8a29647d (ahd) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source radosgw-admin bucket sync status --bucket=<bucket-name> realm 6a7fab77-64e3-453e-b54b-066bc8af2f00 (realm0) zonegroup be660604-d853-4f8e-a576-579cae2e07c2 (zg0) zone d06a8dd3-5bcb-486c-945b-2a98969ccd5f (fbd) bucket :tc******rc-b1[d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1]) source zone d09d3d16-8601-448b-bf3d-609b8a29647d (ahd) source bucket :tc*******arc-b1[d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1]) full sync: 14/9221 shards full sync: 49448693 objects completed incremental sync: 9207/9221 shards bucket is behind on 25 shards behind shards: [9,111,590,826,1774,2968,3132,3382,3386,3409,3685,3820,4174,4544,4708,4811,5733,6285,6558,7288,7417,7443,7876,8151,8878] Error: radosgw-admin sync error list "id": "1_1690799008.725414_3926410.1", "section": "data", "name": "bucket0:d09d3d16-8601-448b-bf3d-609b8a29647d.89871.1:1949", "timestamp": "2023-07-31T10:23:28.725414Z", "info": { "source_zone": "d09d3d16-8601-448b-bf3d-609b8a29647d", "error_code": 125, "message": "failed to sync bucket instance: (125) Operation canceled" "id": "1_1690804503.144829_3759212.1", "section": "data", "name": "bucket1:d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1:1232/S01/1/120/2b7ea802-efad-41d3-9d90-9**************523.txt", "timestamp": "2023-07-31T11:54:53.233451Z", "info": { "source_zone": "d09d3d16-8601-448b-bf3d-609b8a29647d", "error_code": 5, "message": "failed to sync object(5) Input/output error" Thanks Ankit

8 months, 1 week

2
3
0 0

Is it safe to add different OS but same ceph version to the existing cluster?

by Szabo, Istvan (Agoda)

Hi, I have an octopus cluster on the latest octopus version with mgr/mon/rgw/osds on centos 8. Is it safe to add an ubuntu osd host with the same octopus version? Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

8 months, 1 week

3
3
0 0

Running trim / discard on an OSD

by Victor Rodriguez

TL;DR Is there a way to run trim / discard on an OSD? Long story: I have a Proxmox-Ceph cluster with some OSD as storage for VMs. Discard works perfectly in this cluster. For lab and testing purposes I deploy Proxmox-Ceph clusters as Proxmox VMs in this cluster using nested virtualizacion, each with a few disks acting as OSD. Then a few VMs are configured in the nested Proxmox cluster using the nested Ceph as storage. Hope I've explained myself. The problem I'm facing is that the nested Ceph OSD's are not sending the trim/discard command to the upstream Ceph OSDs, so thin provisioning is not kept. Somehow the discard chain is broken at some point. I've tried using this on the nested ceph OSD: ceph config set global bdev_enable_discard true ceph config set global bdev_async_discard true That somewhat helps, but it does not discard all the used space when data is deleted from the VM or the VM is fully removed, it maybe recovers like 20%. This is why I wonder if there is a way to run trim / discard in an OSD. Thanks in advance --

8 months, 1 week

1
0
0 0

Client failing to respond to capability release

by Frank Schilder

Hi all, I have this warning the whole day already (octopus latest cluster): HEALTH_WARN 4 clients failing to respond to capability release; 1 pgs not deep-scrubbed in time [WRN] MDS_CLIENT_LATE_RELEASE: 4 clients failing to respond to capability release mds.ceph-24(mds.1): Client sn352.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 145698301 mds.ceph-24(mds.1): Client sn463.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 189511877 mds.ceph-24(mds.1): Client sn350.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 189511887 mds.ceph-24(mds.1): Client sn403.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 231250695 If I look at the session info from mds.1 for these clients I see this: # ceph tell mds.1 session ls | jq -c '[.[] | {id: .id, h: .client_metadata.hostname, addr: .inst, fs: .client_metadata.root, caps: .num_caps, req: .request_load_avg}]|sort_by(.caps)|.[]' | grep -e 145698301 -e 189511877 -e 189511887 -e 231250695 {"id":189511887,"h":"sn350.hpc.ait.dtu.dk","addr":"client.189511887 v1:192.168.57.221:0/4262844211","fs":"/hpc/groups","caps":2,"req":0} {"id":231250695,"h":"sn403.hpc.ait.dtu.dk","addr":"client.231250695 v1:192.168.58.18:0/1334540218","fs":"/hpc/groups","caps":3,"req":0} {"id":189511877,"h":"sn463.hpc.ait.dtu.dk","addr":"client.189511877 v1:192.168.58.78:0/3535879569","fs":"/hpc/groups","caps":4,"req":0} {"id":145698301,"h":"sn352.hpc.ait.dtu.dk","addr":"client.145698301 v1:192.168.57.223:0/2146607320","fs":"/hpc/groups","caps":7,"req":0} We have mds_min_caps_per_client=4096, so it looks like the limit is well satisfied. Also, the file system is pretty idle at the moment. Why and what exactly is the MDS complaining about here? Thanks and best regards. ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

8 months, 1 week

4
6
0 0

When to use the auth profiles simple-rados-client and profile simple-rados-client-with-blocklist?

by Christian Rohmann

Hey ceph-users, 1) When configuring Gnocchi to use Ceph storage (see https://gnocchi.osci.io/install.html#ceph-requirements) I was wondering if one could use any of the auth profiles like * simple-rados-client * simple-rados-client-with-blocklist ? Or are those for different use cases? 2) I was also wondering why the documentation mentions "(Monitor only)" but then it says "Gives a user read-only permissions for monitor, OSD, and PG data."? 3) And are those profiles really for "read-only" users? Why don't they have "read-only" in their name like the rbd and the corresponding "rbd-read-only" profile? Regards Christian

8 months, 1 week

2
1
0 0

Critical Information: DELL/Toshiba SSDs dying after 70,000 hours of operation

by Frédéric Nass

Hello, This message does not concern Ceph itself but a hardware vulnerability which can lead to permanent loss of data on a Ceph cluster equipped with the same hardware in separate fault domains. The DELL / Toshiba PX02SMF020, PX02SMF040, PX02SMF080 and PX02SMB160 SSD drives of the 13G generation of DELL servers are subject to a vulnerability which renders them unusable after 70,000 hours of operation, i.e. approximately 7 years and 11 months of activity. This topic has been discussed here: https://www.dell.com/community/PowerVault/TOSHIBA-PX02SMF080-has-lost-commu… The risk is all the greater since these disks may die at the same time in the same server leading to the loss of all data in the server. To date, DELL has not provided any firmware fixing this vulnerability, the latest firmware version being "A3B3" released on Sept. 12, 2016: https://www.dell.com/support/home/en-us/ drivers/driversdetails?driverid=hhd9k If your have servers running these drives, check their uptime. If they are close to the 70,000 hour limit, replace them immediately. The smartctl tool does not report the uptime for these SSDs, but if you have HDDs in the server, you can query their SMART status and get their uptime, which should be about the same as the SSDs. The smartctl command is: smartctl -a -d megaraid,XX /dev/sdc (where XX is the iSCSI bus number). We have informed DELL about this but have no information yet on the arrival of a fix. We have lost 6 disks, in 3 different servers, in the last few weeks. Our observation shows that the drives don't survive full shutdown and restart of the machine (power off then power on in iDrac), but they may also die during a single reboot (init 6) or even while the machine is running. Fujitsu released a corrective firmware in June 2021 but this firmware is most certainly not applicable to DELL drives: https://www.fujitsu.com/us/imagesgig5/PY-CIB070-00.pdf Regards, Frederic Sous-direction Infrastructure and Services Direction du Numérique Université de Lorraine

8 months, 1 week

2
2
1 0

OSDs spam log with scrub starts

by Adrien Georget

Hello, On our 16.2.14 CephFS cluster, all OSDs are spamming logs with messages like "log_channel(cluster) log [DBG] : xxx scrub starts". All OSDs are concerned, for different PGs. Cluster is healthy without any recovery ops. For a single PG, we can have hundreds of scrub starts msg in less than an hour. With 720 OSDs (8k PG, EC8+2), it can lead to millions of messages by hour... For example with PG 3.1d57 or||3.1988 : |Aug 31 16:02:09 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug 2023-08-31T14:02:09.453+0000 7fdab1ec4700 0 log_channel(cluster) log [DBG] : 3.1d57 scrub starts|| ||Aug 31 16:02:11 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug 2023-08-31T14:02:11.446+0000 7fdab1ec4700 0 log_channel(cluster) log [DBG] : 3.1d57 scrub starts|| ||Aug 31 16:02:12 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug 2023-08-31T14:02:12.428+0000 7fdab1ec4700 0 log_channel(cluster) log [DBG] : 3.1d57 scrub starts|| ||Aug 31 16:02:13 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug 2023-08-31T14:02:13.456+0000 7fdab1ec4700 0 log_channel(cluster) log [DBG] : 3.1d57 scrub starts|| ||Aug 31 16:02:14 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug 2023-08-31T14:02:14.431+0000 7fdab1ec4700 0 log_channel(cluster) log [DBG] : 3.1d57 scrub starts|| ||Aug 31 16:02:15 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug 2023-08-31T14:02:15.475+0000 7fdab1ec4700 0 log_channel(cluster) log [DBG] : 3.1d57 scrub starts|| ||Aug 31 16:02:21 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug 2023-08-31T14:02:21.516+0000 7fdab1ec4700 0 log_channel(cluster) log [DBG] : 3.1d57 scrub starts|| ||Aug 31 16:02:23 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug 2023-08-31T14:02:23.555+0000 7fdab1ec4700 0 log_channel(cluster) log [DBG] : 3.1d57 scrub starts|| ||Aug 31 16:02:24 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug 2023-08-31T14:02:24.510+0000 7fdab1ec4700 0 log_channel(cluster) log [DBG] : 3.1d57 deep-scrub starts|| Aug 31 16:02:10 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug 2023-08-31T14:02:10.384+0000 7f0606ce3700 0 log_channel(cluster) log [DBG] : 3.1988 deep-scrub starts Aug 31 16:02:11 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug 2023-08-31T14:02:11.377+0000 7f0606ce3700 0 log_channel(cluster) log [DBG] : 3.1988 scrub starts Aug 31 16:02:13 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug 2023-08-31T14:02:13.383+0000 7f0606ce3700 0 log_channel(cluster) log [DBG] : 3.1988 scrub starts Aug 31 16:02:15 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug 2023-08-31T14:02:15.383+0000 7f0606ce3700 0 log_channel(cluster) log [DBG] : 3.1988 deep-scrub starts Aug 31 16:02:17 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug 2023-08-31T14:02:17.336+0000 7f0606ce3700 0 log_channel(cluster) log [DBG] : 3.1988 scrub starts Aug 31 16:02:19 ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug 2023-08-31T14:02:19.328+0000 7f0606ce3700 0 log_channel(cluster) log [DBG] : 3.1988 scrub starts || ||PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN|| ||3.1d57 52757 0 0 0 0 167596026648 0 0 1799 1799 active+clean 2023-08-31T14:27:24.025755+0000 236010'4532653 236011:8745383 [58,421,335,9,59,199,390,481,425,480] 58 [58,421,335,9,59,199,390,481,425,480] 58 231791'4531915 *2023-08-29T22:41:12.266874+0000* 229377'4526369 *2023-08-26T04:30:42.894505+0000* 0| |3.1988 52867 0 0 0 0 168603872808 0 0 1811 1811 active+clean 2023-08-31T14:32:13.361420+0000 236018'4241611 236018:9815753 [276,342,345,299,210,349,85,481,446,46] 276 [276,342,345,299,210,349,85,481,446,46] 276 236012'4241602 *2023-08-31T14:32:13.361343+0000* 228157'4229095 *2023-08-24T05:59:16.573471+0000*| However scrub is working fine, scrub stamp looks OK (2023-08-29 or 2023-08-31) as we have default value for scrub interval (min 24h / max 7days). I tried to play with scrub parameters like osd_scrub_load_threshold (->20), osd_max_scrubs (->3), osd_scrub_*_interval but nothing better. Any idea what's going on and how to fix this? Cheers, Adrien ||

8 months, 1 week

3
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users September 2023