ceph orch device ls - output ( insufficient space ( 10 extents) on vgs lvm detected locked ) Quincy version , Is this just warning or any action should be taken.
Hi,
I am using 17.2.6 on Rocky Linux 8
The ceph mgr dashboard, in my situation, (bare metal install, upgraded from 15->16-> 17.2.6), can no longer hit the ObjectStore->(Daemons,Users,Buckets) pages.
When I try to hit those pages, it gives an error:
RGW REST API failed request with status code 403 {"Code": "AccessDenied", RequestId: "xxxxxxx", HostId: "yyyy-<my zone>"}
The log of the rgw server it hit has:
"GET /admin/metadata/user?myself HTTP/1.1" 403 125
It appears that the mgr dashboard setting RGW_API_HOST is no longer an option that can be set, nor does that name exist anywhere under /usr/share/ceph/mgr/dashboard, and:
# ceph dashboard set-rgw-api-host <host>
is no longer in existence in 17.2.6
However, since my situation is an upgrade, the config value still exists in my config, and I can retrieve it with:
# ceph dashboard get-rgw-api-host
To get the to work in my situation, I have modified /usr/share/ceph/mgr/dashboard/settings.py and re-added RGW_API_HOST to the Options class using
RGW_API_HOST = Settings('', [dict,str])
I then modified /usr/share/ceph/mgr/dashboard/services/rgw_request.py such that each rgw daemon retrieved has its 'host' member set to Settings.RGW_API_HOST.
Then after restarting the mgr, I was able to access the Objectstore->(Daemons,Users,Buckets) pages in the dashboard.
HOWEVER, I know this is NOT the right way to fix this, it is a hack. It seems like the dashboard is trying to contact an rgw server individually. For us, the RGW_API_HOST is
a name in DNS: s3.my.dom, that has multiple A records, one for each of our rgw servers, each of which have the *same* SSL cert with CN and SubjectAltNames that allow
the cert to present itself as both s3.my.dom as well as the individual host name (SubjectAltName has ALL the rgw servers in it). This works well for us and has
done so since 15.x.y, The endpoint for the zone is set to s3.my.dom. Thus my users only have a single endpoint to care about, unless there is a failure situation onan rgw server. (We have other ways of handling that).
Any thoughts on the CORRECT way to handle this so I can have the ceph dashboard work with the ObjectStore->(Daemons,Users,Buckets) pages? Thanks.
-Chris
Hello,
We have a RGW setup that has a bunch of Nginx in front of RGWs to work as a LB. I’m currently working on some metrics and log analysis from the LB logs.
At the moment I’m looking at possibilities to recognise the type of s3 request on the LB. I know that matching the format shouldn’t be extremely hard, but I was looking into a possibility to extract the information from RGW as that’s the part that’s aware of that.
I was working with the LUA part of RGW before so I know that the Request.RGWOp Field is an great fit.
I would like to add this as a some kind of response header, but unfortunately that’s not possible at the moment if I’m not wrong.
Has anyone looked into this (wink wink Yuval :))? Or do you have a recommendation how to do it?
Thanks a lot.
Regards,
Ondrej
Hello Users,
We have the environment as below. Both environments are the zones of one RGW multisite zonegroup, whereas the DC zone is the primary and the DR zone is the secondary at this point.
DC
Ceph Version: 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
Number of rgw daemons : 25
DR
Ceph Version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
Number of rgw daemons : 25
Environment description:
Both the mentioned zones are in production and the RGW multisite bandwidth is over MPLS of around 3 Gbps.
Issue description :
We have enabled the multisite between DC-DR almost around a month ago. The total data at the DC zone is around 159 TiB and the sync has been going as expected . But when the sync had gone around 120 TiB we saw the speed drastically fell low, the ideal was around 2 Gbps, and it fell way below 10 Mbps though the link is not saturated. After checking "# radosgw-admin sync status " the output says "metadata is caught up with master" and "data is caught up with source" but with almost 25 TB data behind as compared to DC. It also looks like the sync status of the bucket " radosgw-admin bucket sync status --bucket=<bucket-name>" still bucket is behind shards. Attaching the log and the output below.
The possibility of issuing a resync of the data from the beginning is quite low and not feasible in our case. The "# radosgw-admin sync error list" output is also attached with some information redacted and we see errors.
radosgw-sync status
radosgw-admin sync status
realm 6a7fab77-64e3-453e-b54b-066bc8af2f00 (realm0)
zonegroup be660604-d853-4f8e-a576-579cae2e07c2 (zg0)
zone d06a8dd3-5bcb-486c-945b-2a98969ccd5f (fbd)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: d09d3d16-8601-448b-bf3d-609b8a29647d (ahd)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
radosgw-admin bucket sync status --bucket=<bucket-name>
realm 6a7fab77-64e3-453e-b54b-066bc8af2f00 (realm0)
zonegroup be660604-d853-4f8e-a576-579cae2e07c2 (zg0)
zone d06a8dd3-5bcb-486c-945b-2a98969ccd5f (fbd)
bucket :tc******rc-b1[d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1])
source zone d09d3d16-8601-448b-bf3d-609b8a29647d (ahd)
source bucket :tc*******arc-b1[d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1])
full sync: 14/9221 shards
full sync: 49448693 objects completed
incremental sync: 9207/9221 shards
bucket is behind on 25 shards
behind shards: [9,111,590,826,1774,2968,3132,3382,3386,3409,3685,3820,4174,4544,4708,4811,5733,6285,6558,7288,7417,7443,7876,8151,8878]
Error: radosgw-admin sync error list
"id": "1_1690799008.725414_3926410.1",
"section": "data",
"name": "bucket0:d09d3d16-8601-448b-bf3d-609b8a29647d.89871.1:1949",
"timestamp": "2023-07-31T10:23:28.725414Z",
"info": {
"source_zone": "d09d3d16-8601-448b-bf3d-609b8a29647d",
"error_code": 125,
"message": "failed to sync bucket instance: (125) Operation canceled"
"id": "1_1690804503.144829_3759212.1",
"section": "data",
"name": "bucket1:d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1:1232/S01/1/120/2b7ea802-efad-41d3-9d90-9**************523.txt",
"timestamp": "2023-07-31T11:54:53.233451Z",
"info": {
"source_zone": "d09d3d16-8601-448b-bf3d-609b8a29647d",
"error_code": 5,
"message": "failed to sync object(5) Input/output error"
Thanks
Ankit
Hi,
I have an octopus cluster on the latest octopus version with mgr/mon/rgw/osds on centos 8.
Is it safe to add an ubuntu osd host with the same octopus version?
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
TL;DR
Is there a way to run trim / discard on an OSD?
Long story:
I have a Proxmox-Ceph cluster with some OSD as storage for VMs. Discard
works perfectly in this cluster. For lab and testing purposes I deploy
Proxmox-Ceph clusters as Proxmox VMs in this cluster using nested
virtualizacion, each with a few disks acting as OSD. Then a few VMs are
configured in the nested Proxmox cluster using the nested Ceph as
storage. Hope I've explained myself.
The problem I'm facing is that the nested Ceph OSD's are not sending the
trim/discard command to the upstream Ceph OSDs, so thin provisioning is
not kept. Somehow the discard chain is broken at some point.
I've tried using this on the nested ceph OSD:
ceph config set global bdev_enable_discard true
ceph config set global bdev_async_discard true
That somewhat helps, but it does not discard all the used space when
data is deleted from the VM or the VM is fully removed, it maybe
recovers like 20%. This is why I wonder if there is a way to run trim /
discard in an OSD.
Thanks in advance
--
Hi all,
I have this warning the whole day already (octopus latest cluster):
HEALTH_WARN 4 clients failing to respond to capability release; 1 pgs not deep-scrubbed in time
[WRN] MDS_CLIENT_LATE_RELEASE: 4 clients failing to respond to capability release
mds.ceph-24(mds.1): Client sn352.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 145698301
mds.ceph-24(mds.1): Client sn463.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 189511877
mds.ceph-24(mds.1): Client sn350.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 189511887
mds.ceph-24(mds.1): Client sn403.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 231250695
If I look at the session info from mds.1 for these clients I see this:
# ceph tell mds.1 session ls | jq -c '[.[] | {id: .id, h: .client_metadata.hostname, addr: .inst, fs: .client_metadata.root, caps: .num_caps, req: .request_load_avg}]|sort_by(.caps)|.[]' | grep -e 145698301 -e 189511877 -e 189511887 -e 231250695
{"id":189511887,"h":"sn350.hpc.ait.dtu.dk","addr":"client.189511887 v1:192.168.57.221:0/4262844211","fs":"/hpc/groups","caps":2,"req":0}
{"id":231250695,"h":"sn403.hpc.ait.dtu.dk","addr":"client.231250695 v1:192.168.58.18:0/1334540218","fs":"/hpc/groups","caps":3,"req":0}
{"id":189511877,"h":"sn463.hpc.ait.dtu.dk","addr":"client.189511877 v1:192.168.58.78:0/3535879569","fs":"/hpc/groups","caps":4,"req":0}
{"id":145698301,"h":"sn352.hpc.ait.dtu.dk","addr":"client.145698301 v1:192.168.57.223:0/2146607320","fs":"/hpc/groups","caps":7,"req":0}
We have mds_min_caps_per_client=4096, so it looks like the limit is well satisfied. Also, the file system is pretty idle at the moment.
Why and what exactly is the MDS complaining about here?
Thanks and best regards.
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hey ceph-users,
1) When configuring Gnocchi to use Ceph storage (see
https://gnocchi.osci.io/install.html#ceph-requirements)
I was wondering if one could use any of the auth profiles like
* simple-rados-client
* simple-rados-client-with-blocklist ?
Or are those for different use cases?
2) I was also wondering why the documentation mentions "(Monitor only)"
but then it says
"Gives a user read-only permissions for monitor, OSD, and PG data."?
3) And are those profiles really for "read-only" users? Why don't they
have "read-only" in their name like the rbd and the corresponding
"rbd-read-only" profile?
Regards
Christian
Hello,
This message does not concern Ceph itself but a hardware vulnerability which can lead to permanent loss of data on a Ceph cluster equipped with the same hardware in separate fault domains.
The DELL / Toshiba PX02SMF020, PX02SMF040, PX02SMF080 and PX02SMB160 SSD drives of the 13G generation of DELL servers are subject to a vulnerability which renders them unusable after 70,000 hours of operation, i.e. approximately 7 years and 11 months of activity.
This topic has been discussed here: https://www.dell.com/community/PowerVault/TOSHIBA-PX02SMF080-has-lost-commu…
The risk is all the greater since these disks may die at the same time in the same server leading to the loss of all data in the server.
To date, DELL has not provided any firmware fixing this vulnerability, the latest firmware version being "A3B3" released on Sept. 12, 2016: https://www.dell.com/support/home/en-us/ drivers/driversdetails?driverid=hhd9k
If your have servers running these drives, check their uptime. If they are close to the 70,000 hour limit, replace them immediately.
The smartctl tool does not report the uptime for these SSDs, but if you have HDDs in the server, you can query their SMART status and get their uptime, which should be about the same as the SSDs.
The smartctl command is: smartctl -a -d megaraid,XX /dev/sdc (where XX is the iSCSI bus number).
We have informed DELL about this but have no information yet on the arrival of a fix.
We have lost 6 disks, in 3 different servers, in the last few weeks. Our observation shows that the drives don't survive full shutdown and restart of the machine (power off then power on in iDrac), but they may also die during a single reboot (init 6) or even while the machine is running.
Fujitsu released a corrective firmware in June 2021 but this firmware is most certainly not applicable to DELL drives: https://www.fujitsu.com/us/imagesgig5/PY-CIB070-00.pdf
Regards,
Frederic
Sous-direction Infrastructure and Services
Direction du Numérique
Université de Lorraine