Hi,
I try to evaluate SSE-C (so customer provides keys) for our object storages.
We do not provide a KMS server.
I've added "Access-Control-Allow-Headers" to the haproxy frontend.
rspadd Access-Control-Allow-Headers...
x-amz-server-side-encryption-customer-algorithm,\
x-amz-server-side-encryption-customer-key,\
x-amz-server-side-encryption-customer-key-MD5
I've also enabled "rgw_trust_forwarded_https = true" in the client section
in the ceph.conf and restarted the RGW daemons.
I now try to get it working, but I am not sure if I am doing it correctly.
$ encKey=$(openssl rand -base64 32)
$ md5Key=$(echo $encKey | md5sum | awk '{print $1}' | base64)
$ aws s3api --endpoint=https://radosgw put-object \
--body ~/Downloads/TESTFILE \
--bucket test-bb-encryption \
--key TESTFILE \
--sse-customer-algorithm AES256 \
--sse-customer-key $encKey \
--sse-customer-key-md5 $md5Key
This is what the RGW log gives me:
2023-03-17T10:55:55.465+0000 7f42bbe5f700 1 ====== starting new request
req=0x7f448c185700 =====
2023-03-17T10:55:55.469+0000 7f434df83700 1 ====== req done
req=0x7f448c185700 op status=-2021 http_status=400 latency=3999985ns ======
2023-03-17T10:55:55.469+0000 7f434df83700 1 beast: 0x7f448c185700: IPV6 -
- [2023-03-17T10:55:55.469539+0000] "PUT /test-bb-encryption/TESTFILE
HTTP/1.1" 400 221 - "aws-cli/2.4.18 Python/3.9.10 Darwin/22.3.0
source/x86_64 prompt/off command/s3api.put-object" -
Maybe someone got a wroking example and is willing to share it with me, or
did also encounter this problem and knows what to do?
It's and octopus cluster.
Cheers
Boris
--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
Hi,
tracker.ceph.com seems to be quite slow recently. Since my colleagues
feel so as well,
this problem wouldn't be specific to me.
Could you tel me if there is a plan to fix this problem near future?
Thanks,
Satoru
Hi,
I suspect a bug in cephadm to configure ingress service for rgw. Our
production server was upgraded from continuously from Luminous to
Pacific. When configuring ingress service for rgw, the haproxy.cfg is
incomplete. The same yaml file applied on our test cluster does the job.
Regards,
Patrick
Hello,
We have a 6-node ceph cluster, all of them have osd running and 3 of them (ceph-1 to ceph-3 )also has the ceph-mgr and ceph-mon. Here is the detailed configuration of each node (swap on ceph-1 to ceph-3 has been disabled after the alarm):
# ceph-1 free -h
total used free shared buff/cache available
Mem: 187Gi 38Gi 5.4Gi 4.1Gi 143Gi 142Gi
Swap: 0B 0B 0B
# ceph-2 free -h
total used free shared buff/cache available
Mem: 187Gi 49Gi 2.6Gi 4.0Gi 135Gi 132Gi
Swap: 0B 0B 0B
# ceph-3 free -h
total used free shared buff/cache available
Mem: 187Gi 37Gi 4.6Gi 4.0Gi 145Gi 144Gi
Swap: 0B 0B 0B
# ceph-4 free -h
total used free shared buff/cache available
Mem: 251Gi 31Gi 8.3Gi 231Mi 211Gi 217Gi
Swap: 124Gi 3.8Gi 121Gi
# ceph-5 free -h
total used free shared buff/cache available
Mem: 251Gi 32Gi 14Gi 135Mi 204Gi 216Gi
Swap: 124Gi 4.0Gi 121Gi
# ceph-6 free -h
total used free shared buff/cache available
Mem: 251Gi 30Gi 16Gi 145Mi 204Gi 218Gi
Swap: 124Gi 4.0Gi 121Gi
We have configured swap space on all of them, for ceph-mgr nodes, we have 8G swap space and 128G swap configured for osd nodes, and our zabbix has monitored a swap over 50% usage for ceph-1 to ceph-3, but our available space are still around 140G against the total 187G. Just wondering whether the swap space is necessary when we have lots of memory available?
Thanks very much for your answering.
ceph pacific 16.2.11 (cephadm managed)
I have configured some NFS mounts from the ceph GUI from cephfs. We can mount the filesystems and view file/directory listings, but cannot read any file data.
The permissions on the shares are RW. We mount from the client using "vers=4.1".
Looking at debug logs from the container running nfs-ganesha, I see the following errors when trying to read a file's content:
15/03/2023 15:27:13 : epoch 6411e209 : gw01 : ganesha.nfsd-7[svc_8] complete_op :NFS4 :DEBUG :Status of OP_READ in position 2 = NFS4ERR_PERM, op response size is 7480 total response size is 7568
15/03/2023 15:27:13 : epoch 6411e209 : gw01 : ganesha.nfsd-7[svc_8] complete_nfs4_compound :NFS4 :DEBUG :End status = NFS4ERR_PERM lastindex = 3
Also, watching the TCP traffic, I see errors in the NFS protocol corresponding to these messages:
11:44:43.745570 IP xxx.747 > gw01.nfs: Flags [P.], seq 24184536:24184748, ack 11409577, win 602, options [nop,nop,TS val 342245425 ecr 2683489461], length 212: NFS request xid 156024373 208 getattr fh 0,1/53
11:44:43.745683 IP gw01.nfs > xxx.747: Flags [P.], seq 11409577:11409677, ack 24184748, win 3081, options [nop,nop,TS val 2683489461 ecr 342245425], length 100: NFS reply xid 156024373 reply ok 96 getattr ERROR: Operation not permitted
So there appears to be a permissions problem where nfs-ganesha is not able to "getattr" on cephfs data.
The export looks like this (read from rados):
EXPORT {
FSAL {
name = "CEPH";
user_id = "nfs.cephfs.7";
filesystem = "cephfs";
secret_access_key = "xxx";
}
export_id = 7;
path = "/exports/nfs/foobar";
pseudo = "/foobar";
access_type = "RW";
squash = "no_root_squash";
attr_expiration_time = 0;
security_label = false;
protocols = 4;
transports = "TCP";
}
ceph auth permissions for the nfs.cephfs.7 client:
[client.nfs.cephfs.7]
key = xxx
caps mds = "allow rw path=/exports/nfs/foobar"
caps mon = "allow r"
caps osd = "allow rw pool=.nfs namespace=cephfs, allow rw tag cephfs data=cephfs"
Any suggestions?
Hi,
I tried to respond directly in the web ui of the mailing list but my
message is queued for moderation. I just wanted to update a solution
that worked for me when a service spec is stuck in a pending state,
maybe this will help others in the same situation.
While playing around with a test cluster I ended up with a "deleting"
osd service spec. The SUSE team has an article [1] for this case, the
following helped me resolve this issue. I had three different osd
specs in place for the same three nodes:
---snip---
osd 3 <deleting> 3w nautilus2;nautilus3
osd.osd-hdd-ssd 3 2m ago 2w
nautilus;nautilus2;nautilus3
osd.osd-hdd-ssd-mix 3 2m ago - <unmanaged>
---snip---
I replaced the "service_name" with the more suiting value
("osd.osd-hdd-ssd") in the unit.meta file of each OSD containing the
invalid spec, then restarted each affected OSD. It probably wouldn't
have been necessary but I wanted to see the effect immediately, so I
failed over the mgr (ceph mgr fail), now I only have one valid osd spec.
---snip---
# before
nautilus3:~ # grep service_name
/var/lib/ceph/201a2fbc-ce7b-44a3-9ed7-39427972083b/osd.3/unit.meta
"service_name": "osd",
# after
nautilus3:~ # grep service_name
/var/lib/ceph/201a2fbc-ce7b-44a3-9ed7-39427972083b/osd.3/unit.meta
"service_name": "osd.osd-hdd-ssd",
nautilus3:~ # ceph orch ls osd
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
osd.osd-hdd-ssd 9 10m ago 2w nautilus;nautilus2;nautilus3
---snip---
Regards,
Eugen
[1] https://www.suse.com/support/kb/doc/?id=000020667
Hello ceph-users,
unhappy with the capabilities in regards to bucket access policies when
using the Keystone authentication module
I posted to this ML a while back -
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/S2TV7GVFJT…
In general I'd still like to hear how others are making use of external
authentication and STS and what your
experiences are in replacing e.g. Keystone authentication
In the meantime we looked into OIDC authentication (via Keycloak) and
the potentials there.
While this works in general, AssumeRoleWithWebIdentity comes back with
an STS token and that can be used to access S3 buckets,
I am wondering about a few things:
1) How to enable STS for everyone (without user-individual policy to
AssumeRole)
In the documentation on STS
(https://docs.ceph.com/en/quincy/radosgw/STS/#sts-in-ceph) and also
STS-Lite (https://docs.ceph.com/en/quincy/radosgw/STSLite/#sts-lite)
it's implied at one has to attach an dedicated policy to allow for STS
to each user individually. This does not scale well with thousands of
users. Also when using a federated / external authentication, there is no
explicit user creation "A shadow user is created corresponding to every
federated user. The user id is derived from the ‘sub’ field of the
incoming web token."
Is there a way to automatically have a role corresponding to each user
that can be assumed via a OIDC token?
So an implicit role that would allow for an externally authenticated
user to have full access to S3 and all buckets owned?
Looking at STS Lite documentation, it seems all the more natural to be
able to allow keystone users to make use of STS.
Is there any way to apply such an AssumeRole policy "globally" or for a
whole set of users at the same time?
I just found PR https://github.com/ceph/ceph/pull/44434 aiming to add
policy variables such as ${aws:username} to allow for generic policies.
But this is more about restricting bucket names or granting access to
certain pattern of names.
2) Isolation in S3 Multi-Tenancy with external IdP
(AssumeRoleWithWebIdentity), how does bucket ownership come into play?
Following the question about generic policies for STS I am wondering
about the role (no pun intended) that the bucket ownership or tenant
play here?
If one creates a role policy of e.g.
{"Version":"2012-10-17","Statement":{"Effect":"Allow","Action":"s3:*","Resource":"arn:aws:s3:::*"}}
Would this allow someone assuming this role access to all, "*", buckets,
or just those owned by the user that created this role policy?
In case of Keystone auth the owner of a bucket is the project, not the
individual (human) user. So this creates somewhat of a tenant which I'd
want to isolate.
3) Allowing users to create their own roles and policies by default
Is there a way to allow users to create their own roles and policies to
use them by default?
All the examples talk about the requirement for admin caps and
individual setting of '--caps="user-policy=*'.
If there was a default role + policy (question #1) that could be applied
to externally authenticated users, I'd like for them to be able to
create new roles and policies to grant access to their buckets to other
users.
Regards
Christian
hi, everyone,
I have a question about repairing the broken WAL/DB device.
I have a cluster with 8 OSDs, and 4 WAL/DB devices(1 OSD per WAL/DB
device), and hwo can I repair the OSDs quickly if
one WAL/DB device breaks down without rebuilding the them? Thanks.
Dear Ceph Team,
I hope this email finds you well. I am writing to express my keen interest
in participating in the Google Summer of Code (GSoC) program 2023 with your
team.
I am a 3rd year B.tech student in Computer Science Engineering, with a
strong passion for [specific area of interest related to the team's
project(s)]. I have experience working in C++, and I believe that I can
contribute significantly to your project by bringing my expertise,
enthusiasm, and commitment.
I have been following the GSoC program, and I understand the dedication and
hard work required to complete a project successfully. Therefore, I am
willing to commit my time and effort to meet the expectations and
requirements of the program. I am open to learning new technologies and
programming languages, and I believe that this opportunity will help me
grow both personally and professionally.
I have reviewed the list of your team's project ideas, and I am
particularly interested in Disk Fragmentation Simulator. I would appreciate
it if you could provide me with any additional information or resources
that may be helpful to better understand the project requirements and goals.
Thank you for taking the time to read my email, and I look forward to
hearing back from you soon.
Best regards,
Arush Sharma
Dear List,
Today i was sucessfully upgrading with cephadm from 16.2.8 -> 16.2.9 -> 16.2.10 -> 16.2.11
Now i wanted to upgrade to 17.2.0 but after starting the upgrade with
```
# ceph orch upgrade start --ceph-version 17.2.0
```
The orch manager module seems to be gone now and the upgrade don't seem to run.
```
# ceph orch upgrade status
Error ENOENT: No orchestrator configured (try `ceph orch set backend`)
# ceph orch set backend cephadm
Error ENOENT: Module not found
```
During the failed upgrade all nodes had the 16.2.11 cephadm installed.
Fortunately the cluster is still running... somehow. I installed the latest 17.2.X cephadm on all
nodes and rebooted them nodes, but this didn't help.
Does someone have a hint?
Yours,
bbk