March 2023 - ceph-users - lists.ceph.io

radosgw SSE-C is not working (InvalidRequest)

by Boris Behrens

Hi, I try to evaluate SSE-C (so customer provides keys) for our object storages. We do not provide a KMS server. I've added "Access-Control-Allow-Headers" to the haproxy frontend. rspadd Access-Control-Allow-Headers... x-amz-server-side-encryption-customer-algorithm,\ x-amz-server-side-encryption-customer-key,\ x-amz-server-side-encryption-customer-key-MD5 I've also enabled "rgw_trust_forwarded_https = true" in the client section in the ceph.conf and restarted the RGW daemons. I now try to get it working, but I am not sure if I am doing it correctly. $ encKey=$(openssl rand -base64 32) $ md5Key=$(echo $encKey | md5sum | awk '{print $1}' | base64) $ aws s3api --endpoint=https://radosgw put-object \ --body ~/Downloads/TESTFILE \ --bucket test-bb-encryption \ --key TESTFILE \ --sse-customer-algorithm AES256 \ --sse-customer-key $encKey \ --sse-customer-key-md5 $md5Key This is what the RGW log gives me: 2023-03-17T10:55:55.465+0000 7f42bbe5f700 1 ====== starting new request req=0x7f448c185700 ===== 2023-03-17T10:55:55.469+0000 7f434df83700 1 ====== req done req=0x7f448c185700 op status=-2021 http_status=400 latency=3999985ns ====== 2023-03-17T10:55:55.469+0000 7f434df83700 1 beast: 0x7f448c185700: IPV6 - - [2023-03-17T10:55:55.469539+0000] "PUT /test-bb-encryption/TESTFILE HTTP/1.1" 400 221 - "aws-cli/2.4.18 Python/3.9.10 Darwin/22.3.0 source/x86_64 prompt/off command/s3api.put-object" - Maybe someone got a wroking example and is willing to share it with me, or did also encounter this problem and knows what to do? It's and octopus cluster. Cheers Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groÃƒ¼en Saal.

1 year, 2 months

1
1
0 0

tracker.ceph.com is slow

by Satoru Takeuchi

Hi, tracker.ceph.com seems to be quite slow recently. Since my colleagues feel so as well, this problem wouldn't be specific to me. Could you tel me if there is a plan to fix this problem near future? Thanks, Satoru

1 year, 2 months

1
0
0 0

How to submit a bug report ?

by Patrick Vranckx

Hi, I suspect a bug in cephadm to configure ingress service for rgw. Our production server was upgraded from continuously from Luminous to Pacific. When configuring ingress service for rgw, the haproxy.cfg is incomplete. The same yaml file applied on our test cluster does the job. Regards, Patrick

1 year, 2 months

2
1
0 0

Concerns about swap in ceph nodes

by sbryan Song

Hello, We have a 6-node ceph cluster, all of them have osd running and 3 of them (ceph-1 to ceph-3 )also has the ceph-mgr and ceph-mon. Here is the detailed configuration of each node (swap on ceph-1 to ceph-3 has been disabled after the alarm): # ceph-1 free -h total used free shared buff/cache available Mem: 187Gi 38Gi 5.4Gi 4.1Gi 143Gi 142Gi Swap: 0B 0B 0B # ceph-2 free -h total used free shared buff/cache available Mem: 187Gi 49Gi 2.6Gi 4.0Gi 135Gi 132Gi Swap: 0B 0B 0B # ceph-3 free -h total used free shared buff/cache available Mem: 187Gi 37Gi 4.6Gi 4.0Gi 145Gi 144Gi Swap: 0B 0B 0B # ceph-4 free -h total used free shared buff/cache available Mem: 251Gi 31Gi 8.3Gi 231Mi 211Gi 217Gi Swap: 124Gi 3.8Gi 121Gi # ceph-5 free -h total used free shared buff/cache available Mem: 251Gi 32Gi 14Gi 135Mi 204Gi 216Gi Swap: 124Gi 4.0Gi 121Gi # ceph-6 free -h total used free shared buff/cache available Mem: 251Gi 30Gi 16Gi 145Mi 204Gi 218Gi Swap: 124Gi 4.0Gi 121Gi We have configured swap space on all of them, for ceph-mgr nodes, we have 8G swap space and 128G swap configured for osd nodes, and our zabbix has monitored a swap over 50% usage for ceph-1 to ceph-3, but our available space are still around 140G against the total 187G. Just wondering whether the swap space is necessary when we have lots of memory available? Thanks very much for your answering.

1 year, 2 months

3
3
0 0

Ceph NFS data - cannot read files, getattr returns NFS4ERR_PERM

by Wyll Ingersoll

ceph pacific 16.2.11 (cephadm managed) I have configured some NFS mounts from the ceph GUI from cephfs. We can mount the filesystems and view file/directory listings, but cannot read any file data. The permissions on the shares are RW. We mount from the client using "vers=4.1". Looking at debug logs from the container running nfs-ganesha, I see the following errors when trying to read a file's content: 15/03/2023 15:27:13 : epoch 6411e209 : gw01 : ganesha.nfsd-7[svc_8] complete_op :NFS4 :DEBUG :Status of OP_READ in position 2 = NFS4ERR_PERM, op response size is 7480 total response size is 7568 15/03/2023 15:27:13 : epoch 6411e209 : gw01 : ganesha.nfsd-7[svc_8] complete_nfs4_compound :NFS4 :DEBUG :End status = NFS4ERR_PERM lastindex = 3 Also, watching the TCP traffic, I see errors in the NFS protocol corresponding to these messages: 11:44:43.745570 IP xxx.747 > gw01.nfs: Flags [P.], seq 24184536:24184748, ack 11409577, win 602, options [nop,nop,TS val 342245425 ecr 2683489461], length 212: NFS request xid 156024373 208 getattr fh 0,1/53 11:44:43.745683 IP gw01.nfs > xxx.747: Flags [P.], seq 11409577:11409677, ack 24184748, win 3081, options [nop,nop,TS val 2683489461 ecr 342245425], length 100: NFS reply xid 156024373 reply ok 96 getattr ERROR: Operation not permitted So there appears to be a permissions problem where nfs-ganesha is not able to "getattr" on cephfs data. The export looks like this (read from rados): EXPORT { FSAL { name = "CEPH"; user_id = "nfs.cephfs.7"; filesystem = "cephfs"; secret_access_key = "xxx"; } export_id = 7; path = "/exports/nfs/foobar"; pseudo = "/foobar"; access_type = "RW"; squash = "no_root_squash"; attr_expiration_time = 0; security_label = false; protocols = 4; transports = "TCP"; } ceph auth permissions for the nfs.cephfs.7 client: [client.nfs.cephfs.7] key = xxx caps mds = "allow rw path=/exports/nfs/foobar" caps mon = "allow r" caps osd = "allow rw pool=.nfs namespace=cephfs, allow rw tag cephfs data=cephfs" Any suggestions?

1 year, 2 months

2
8
0 0

Re: Stuck OSD service specification - can't remove

by Eugen Block

Hi, I tried to respond directly in the web ui of the mailing list but my message is queued for moderation. I just wanted to update a solution that worked for me when a service spec is stuck in a pending state, maybe this will help others in the same situation. While playing around with a test cluster I ended up with a "deleting" osd service spec. The SUSE team has an article [1] for this case, the following helped me resolve this issue. I had three different osd specs in place for the same three nodes: ---snip--- osd 3 <deleting> 3w nautilus2;nautilus3 osd.osd-hdd-ssd 3 2m ago 2w nautilus;nautilus2;nautilus3 osd.osd-hdd-ssd-mix 3 2m ago - <unmanaged> ---snip--- I replaced the "service_name" with the more suiting value ("osd.osd-hdd-ssd") in the unit.meta file of each OSD containing the invalid spec, then restarted each affected OSD. It probably wouldn't have been necessary but I wanted to see the effect immediately, so I failed over the mgr (ceph mgr fail), now I only have one valid osd spec. ---snip--- # before nautilus3:~ # grep service_name /var/lib/ceph/201a2fbc-ce7b-44a3-9ed7-39427972083b/osd.3/unit.meta "service_name": "osd", # after nautilus3:~ # grep service_name /var/lib/ceph/201a2fbc-ce7b-44a3-9ed7-39427972083b/osd.3/unit.meta "service_name": "osd.osd-hdd-ssd", nautilus3:~ # ceph orch ls osd NAME PORTS RUNNING REFRESHED AGE PLACEMENT osd.osd-hdd-ssd 9 10m ago 2w nautilus;nautilus2;nautilus3 ---snip--- Regards, Eugen [1] https://www.suse.com/support/kb/doc/?id=000020667

1 year, 2 months

1
0
0 0

External Auth (AssumeRoleWithWebIdentity) , STS by default, generic policies and isolation by ownership

by Christian Rohmann

Hello ceph-users, unhappy with the capabilities in regards to bucket access policies when using the Keystone authentication module I posted to this ML a while back - https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/S2TV7GVFJT… In general I'd still like to hear how others are making use of external authentication and STS and what your experiences are in replacing e.g. Keystone authentication In the meantime we looked into OIDC authentication (via Keycloak) and the potentials there. While this works in general, AssumeRoleWithWebIdentity comes back with an STS token and that can be used to access S3 buckets, I am wondering about a few things: 1) How to enable STS for everyone (without user-individual policy to AssumeRole) In the documentation on STS (https://docs.ceph.com/en/quincy/radosgw/STS/#sts-in-ceph) and also STS-Lite (https://docs.ceph.com/en/quincy/radosgw/STSLite/#sts-lite) it's implied at one has to attach an dedicated policy to allow for STS to each user individually. This does not scale well with thousands of users. Also when using a federated / external authentication, there is no explicit user creation "A shadow user is created corresponding to every federated user. The user id is derived from the ‘sub’ field of the incoming web token." Is there a way to automatically have a role corresponding to each user that can be assumed via a OIDC token? So an implicit role that would allow for an externally authenticated user to have full access to S3 and all buckets owned? Looking at STS Lite documentation, it seems all the more natural to be able to allow keystone users to make use of STS. Is there any way to apply such an AssumeRole policy "globally" or for a whole set of users at the same time? I just found PR https://github.com/ceph/ceph/pull/44434 aiming to add policy variables such as ${aws:username} to allow for generic policies. But this is more about restricting bucket names or granting access to certain pattern of names. 2) Isolation in S3 Multi-Tenancy with external IdP (AssumeRoleWithWebIdentity), how does bucket ownership come into play? Following the question about generic policies for STS I am wondering about the role (no pun intended) that the bucket ownership or tenant play here? If one creates a role policy of e.g. {"Version":"2012-10-17","Statement":{"Effect":"Allow","Action":"s3:*","Resource":"arn:aws:s3:::*"}} Would this allow someone assuming this role access to all, "*", buckets, or just those owned by the user that created this role policy? In case of Keystone auth the owner of a bucket is the project, not the individual (human) user. So this creates somewhat of a tenant which I'd want to isolate. 3) Allowing users to create their own roles and policies by default Is there a way to allow users to create their own roles and policies to use them by default? All the examples talk about the requirement for admin caps and individual setting of '--caps="user-policy=*'. If there was a default role + policy (question #1) that could be applied to externally authenticated users, I'd like for them to be able to create new roles and policies to grant access to their buckets to other users. Regards Christian

1 year, 2 months

2
1
0 0

How to repair the OSDs while WAL/DB device breaks down

by Norman

hi, everyone, I have a question about repairing the broken WAL/DB device. I have a cluster with 8 OSDs, and 4 WAL/DB devices(1 OSD per WAL/DB device), and hwo can I repair the OSDs quickly if one WAL/DB device breaks down without rebuilding the them? Thanks.

1 year, 2 months

2
4
0 0

Expression of Interest in Participating in GSoC 2023 with Your Team

by Arush Sharma

Dear Ceph Team, I hope this email finds you well. I am writing to express my keen interest in participating in the Google Summer of Code (GSoC) program 2023 with your team. I am a 3rd year B.tech student in Computer Science Engineering, with a strong passion for [specific area of interest related to the team's project(s)]. I have experience working in C++, and I believe that I can contribute significantly to your project by bringing my expertise, enthusiasm, and commitment. I have been following the GSoC program, and I understand the dedication and hard work required to complete a project successfully. Therefore, I am willing to commit my time and effort to meet the expectations and requirements of the program. I am open to learning new technologies and programming languages, and I believe that this opportunity will help me grow both personally and professionally. I have reviewed the list of your team's project ideas, and I am particularly interested in Disk Fragmentation Simulator. I would appreciate it if you could provide me with any additional information or resources that may be helpful to better understand the project requirements and goals. Thank you for taking the time to read my email, and I look forward to hearing back from you soon. Best regards, Arush Sharma

1 year, 2 months

2
1
0 0

Upgrade 16.2.11 -> 17.2.0 failed

by bbk

Dear List, Today i was sucessfully upgrading with cephadm from 16.2.8 -> 16.2.9 -> 16.2.10 -> 16.2.11 Now i wanted to upgrade to 17.2.0 but after starting the upgrade with ``` # ceph orch upgrade start --ceph-version 17.2.0 ``` The orch manager module seems to be gone now and the upgrade don't seem to run. ``` # ceph orch upgrade status Error ENOENT: No orchestrator configured (try `ceph orch set backend`) # ceph orch set backend cephadm Error ENOENT: Module not found ``` During the failed upgrade all nodes had the 16.2.11 cephadm installed. Fortunately the cluster is still running... somehow. I installed the latest 17.2.X cephadm on all nodes and rebooted them nodes, but this didn't help. Does someone have a hint? Yours, bbk

1 year, 2 months

4
7
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2023