Hello guys,
could someone help me with this? We've been long-time CEPH users... runing several Mimic + Pacific CEPH clusters. Dozens of disk per cluster, typically.
BUT... now I have this brand new Quincy cluster and I'm not able to give CLIENT (Quincy on Rocky 8) rw access to ONE IMAGE on Quincy cluster (cephadm / Rocky 9).
I'm using something what worked for us for ages:
rbd auth ls:
client.xxx
key: ...
caps: [mon] profile rbd
caps: [osd] allow rwx pool prod object_prefix rbd_data.600d1c6723ae; allow rwx pool prod object_prefix rbd_header.600d1c6723ae; allow rx pool prod object_prefix rbd_id.xxx-data
rbd info:
rbd image 'xxx-data':
size 2 TiB in 524288 objects
order 22 (4 MiB objects)
snapshot_count: 2
id: 600d1c6723ae
block_name_prefix: rbd_data.600d1c6723ae
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags:
rados ls:
rbd_data.600d1c6723ae.000000000003958d
rbd_header.600d1c6723ae
rbd_id.xxx-data
BUT... it DOES NOT WORK. When I try it to map on client it says:
2023-02-11T20:49:18.665+0100 7f3a337fe700 -1 librbd::image::GetMetadataRequest: 0x7f3a1c001f40 handle_metadata_list: failed to retrieve image metadata: (1) Operation not permitted
2023-02-11T20:49:18.665+0100 7f3a337fe700 -1 librbd::image::RefreshRequest: failed to retrieve pool metadata: (1) Operation not permitted
2023-02-11T20:49:18.665+0100 7f3a337fe700 -1 librbd::image::OpenRequest: failed to refresh image: (1) Operation not permitted
2023-02-11T20:49:18.665+0100 7f3a337fe700 -1 librbd::ImageState: 0x555eff78cfc0 failed to open image: (1) Operation not permitted
rbd: error opening image xxx-data: (1) Operation not permitted
The mapping and access DOES work when I put "osd allow *" into ceph auth.
What is the recommended syntax for Quincy?
btw: this use case should be mentioned in the manual I think...
Thanks!
Hi All,
Sorry if this was mentioned previously (I obviously missed it if it was)
but can we upgrade a Ceph Quincy Host/Cluster from Rocky Linux (RHEL)
v8.6/8.7 to v9.1 (yet), and if so, what is / where can I find the
procedure to do this - ie is there anything "special" that needs to be
done because of Ceph, or can we just do a "simple" v8.x +> v9.1 upgrade?
Thanks in advance
Cheers
Dulux-Oz
Hi! 😊
It would be very kind of you to help us with that!
We have pools in our ceph cluster that are set to replicated size 2 min_size 1.
Obviously we want to go to size 3 / min_size 2 but we experience problems with that.
USED goes to 100% instantly and MAX AVAIL goes to 0. Write operations seemed to stop.
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
Pool1 24 35791G 35.04 66339G 8927762
Pool2 25 11610G 14.89 66339G 3004740
Pool3 26 17557G 100.00 0 2666972
Before the change it was like this:
NAME ID USED %USED MAX AVAIL OBJECTS
Pool1 24 35791G 35.04 66339G 8927762
Pool2 25 11610G 14.89 66339G 3004740
Pool3 26 17558G 20.93 66339G 2667013
This was quite surprising to us as we’d expect USED to go to something like 30%.
Going back to 2/1 also gave us back the 20.93% usage instantly.
What’s the matter here?
Thank you and best regards
Stefan
________________________________
BearingPoint GmbH
Sitz: Wien
Firmenbuchgericht: Handelsgericht Wien
Firmenbuchnummer: FN 175524z
The information in this email is confidential and may be legally privileged. If you are not the intended recipient of this message, any review, disclosure, copying, distribution, retention, or any action taken or omitted to be taken in reliance on it is prohibited and may be unlawful. If you are not the intended recipient, please reply to or forward a copy of this message to the sender and delete the message, any attachments, and any copies thereof from your system.
i deployed kolla-ansible & cephadm on virtual machines (kvm) .
My ceph cluster is on 3 vms with 12 vCPU each and 24gb of ram i used cephadm to deploy ceph
ceph -s :
--------------------------
cluster:
id: a0e5ad36-a54c-11ed-9aea-5254008c2a3e
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph0,ceph1,ceph2 (age 6h)
mgr: ceph0.dzutak(active, since 24h), standbys: ceph1.aizuyc
mds: 3/3 daemons up, 6 standby
osd: 9 osds: 9 up (since 24h), 9 in (since 24h)
data:
volumes: 3/3 healthy
pools: 9 pools, 257 pgs
objects: 70 objects, 7.3 KiB
usage: 76 MiB used, 780 GiB / 780 GiB avail
pgs: 257 active+clean
--------------------------
my openstack deployment is AIO on a single node , now i wanna link them together so i started with manila & native cephfs thinking its the easist following this doc :
https://docs.openstack.org/manila/latest/admin/cephfs_driver.html#authorizi…
i created the user
--------------------------
client.manila
key: AQC7ot9jfiDsIxAA57fb7S6bVMnr5IadsnukHQ==
caps: [mgr] allow rw
caps: [mon] allow r
caps: [osd] allow rw pool=ganesha_rados_store
and created a file system called manila
--------------------------
my ceph.conf
--------------------------
[global]
fsid = a0e5ad36-a54c-11ed-9aea-5254008c2a3e
mon_host = [v2:192.168.122.25:3300/0,v1:192.168.122.25:6789/0] [v2:192.168.122.115:3300/0,v1:192.168.122.115:6789/0] [v2:192.168.122.14:3300/0,v1:192.168.122.14:6789/0]
--------------------------
i moved the files as to the openstack node and trying to connect them together but it didnt go will , Viewing the logs shows
--------------------------
<AIO@cephfsnative1: manila.exception.ShareBackendException: json_command failed - prefix=fs volume ls, argdict={'format': 'json'} - exception message: Bad target type 'mon-mgr'.
--------------------------where should i start to fix this issue ?
Good morning everyone, been running a small Ceph cluster with Proxmox for a while now and I’ve finally run across an issue I can’t find any information on. I have a 3 node cluster with 9 Samsung PM983 960GB NVME drives running on a dedicated 10gb network. RBD and CephFS performance have been great, most of the time I see over 500MBs writes and a rados benchmark shows 951 MB/s write and 1140 MB/s read bandwidth.
The problem I’m seeing is after setting up RadosGW I can only upload to “S3” at around 25MBs with the official AWS CLI. Using s3cmd is slightly better at around 45MB/s. I’m going directly to the RadosGW instance with no load balancers in between and no ssl enabled. Just trying to figure out if this is normal. I’m not expecting it to be as fast as writing directly to a RBD but I was kinda hoping for more than this.
So what should I expect in performance from the RadosGW?
Here are some rados bench results and my ceph report
https://gist.github.com/shawnweeks/f6ef028284b5cdb10d80b8dc0654eec5https://gist.github.com/shawnweeks/7cfe94c08adbc24f2a3d8077688df438
Thanks
Shawn
Hello Friends,
i have a strange output when issuing following command
root@node35:~# rbd du -p cephhdd-001-mypool
NAME PROVISIONED USED
...
vm-99936587-disk-0@H202302091535 400 GiB 5.2 GiB
vm-99936587-disk-0@H202302091635 400 GiB 1.2 GiB
vm-99936587-disk-0 400 GiB 732 MiB
vm-9999104-cloudinit 4 MiB 4 MiB
vm-9999104-disk-0 600 GiB 586 GiB
<TOTAL> 49 TiB 44 TiB
rbd: du failed: (2) No such file or directory
root@node35:~#
I do not know why i receive "rbd: du failed: (2) No such file or
directory".
How can i find the origin for this?
My Ceph-Version 17.2.3 installed with "cephadm".
Cluster is "HEALTH_OK" with 108 OSDs distributed over 3 Nodes where
mgr/mon also resides.
Hope you can help
Mehmet
Hi,
Yet another question about OSD memory usage ...
I have a test cluster running. When I do a ceph orch ps I see for my osd.11:
ceph orch ps --refresh
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
osd.11 ceph01 running (2h) 97s ago 2h 23.0G 13.1G 17.2.5 cc65afd6173a 5d1062e8d392
When I chek via top on the machine I see:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
39807 ceph 20 0 6254956 3.7g 9228 S 31.2 3.0 846:21.63 /usr/bin/ceph-osd -n osd.11 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-lo+
Now, where does ceph orch ps get those 23.0G from, when top just shows 3.7G resident and 6.2G virtual for osd.11?
(I do understand that the MEM LIM n the ceph orch ps list is not really the limit)
Anyone know where that discrepancy comes from?
Ciao, Uli