Dear Casey
I hope you had a good Easter and that this mail finds you in good health.
I was wondering if you had some time to answer the question below regarding the backward
compatibility of the RGW.
Many thanks!
Sincerely
Francois
________________________________________
From: Scheurer François
Sent: Saturday, April 4, 2020 3:52 PM
To: Casey Bodley; ceph-users(a)ceph.io
Cc: Engelmann Florian; Rafael Weingärtner
Subject: Re: Fw: Incompatibilities (implicit_tenants & barbican) with Openstack after
migrating from Ceph Luminous to Nautilus.
Dear Casey
We cherry picked your backports for the patches for multi-tenant and barbican (and also
one for keystone caching) on rgw 14.2.8 :
Merge pull request #26095 from bbc/s3secretcache
rgw: Added caching for S3 credentials retrieved from keystone
(cherry picked from commit affb7d396f76273e885cfdbcd363c1882496726c)
get barbican secret key request return error code
Signed-off-by: Richard Bai(白学余) <baixueyu(a)inspur.com>
(cherry picked from commit fbe2be57474df43996dd45bf04d1a1137a02c729)
rgw: making implicit_tenants backwards compatible.
Signed-off-by: Marcus Watts <mwatts(a)redhat.com>
(cherry picked from commit 3ba7be8d1ac7ee43e69eebb58263cd080cca1d38)
After building this new rgw 14.2.8, we tested it successfully on two stage ceph clusters:
- one with all ceph daemons 14.2.5
- one with all ceph daemons 12.2.12
We tested barbican and keystone integration, put & get & list, bucket moving
between tenants and flat namespace without any issue.
Again a big thanks for your help and PR!
Our understanding was that rgw is a client of the librados/RADOS layers (managed by the
OSD's and MON's with a clean separation of layers)
and that a newer rgw daemon will work on older OSD's & MON's, with maybe some
newer rgw features not available.
But I was told on the maillist that
"The reason is that many parts of RGW are implemented in the OSD themselves, so you
can't run a new RGW against an old OSD."
cf.
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/45VKDHFLUKG…
Does it mean that doing that could lead to data corruption & loss ?
Or is it just about newer features being unavailable?
(again during our tests we encountered no issues)
Isn't rgw asking 'ceph features' to adapt itself to the available featureset?
Is RGW really partly implemented in the OSD code? Or is just that some RGW features
depends on OSD features?
Thank you for your insights!
Cheers
Francois
________________________________________
From: Casey Bodley <cbodley(a)redhat.com>
Sent: Thursday, March 5, 2020 3:57 PM
To: Scheurer François; ceph-users(a)ceph.io
Cc: Engelmann Florian; Rafael Weingärtner
Subject: Re: Fw: Incompatibilities (implicit_tenants & barbican) with Openstack after
migrating from Ceph Luminous to Nautilus.
On 3/3/20 2:33 PM, Scheurer François wrote:
/(resending to the new maillist)/
Dear Casey, Dear All,
We tested the migration from Luminous to Nautilus and noticed two
regressions breaking the RGW integration in Openstack:
1) the following config parameter is not working on Nautilus but is
valid on Luminous and on Master:
rgw_keystone_implicit_tenants = swift
In the log: parse error setting
'rgw_keystone_implicit_tenants' to 'swift' (Expected option value to
be integer, got 'swift')
This param is important to make RGW working for S3 and Swift.
Setting it to false breaks swift/openstack and setting it to true
makes S3 incompatible with dns-style bucketnames (with shared or
public access).
Please note that path-style bucketnames are deprecated by AWS and
most clients are only supporting dns-style...
Ref.:
https://tracker.ceph.com/issues/24348
<https://tracker.ceph.com/issues/24348>
https://github.com/ceph/ceph/commit/3ba7be8d1ac7ee43e69eebb58263cd080cca1d38
Ok, wow. It looks like this commit was backported to luminous in
https://github.com/ceph/ceph/pull/22363 over a year before it actually
merged to master as part of
https://github.com/ceph/ceph/pull/28813, so
missed the mimic and nautilus releases. I prepared those backports in
https://tracker.ceph.com/issues/44445 and
https://tracker.ceph.com/issues/44444.
2) the server-side encryption (SSE-KMS) is broken on Nautilus:
to reproduce the issue:
s3cmd --access_key $ACCESSKEY --secret_key $SECRETKEY
--host-bucket "%(bucket)s.$ENDPOINT" --host "$ENDPOINT"
--region="$REGION" --signature-v2 --no-preserve --no-ssl
--server-side-encryption --server-side-encryption-kms-id ${SECRET##*/}
put helloenc.txt s3://testenc/
output:
upload: 'helloenc.txt' -> 's3://testenc/helloenc.txt' [1
of 1]
9 of 9 100% in 0s 37.50 B/s done
ERROR: S3 error: 403 (AccessDenied): Failed to retrieve
the actual key, kms-keyid: cd0903db-c613-49be-96d9-165c02544bc7
rgw log: see below
TLDR: after investigating, I found that radosgw was actually
getting the barbican secret correctly but the HTTP CODE (=200)
validation was failing because of a bug in Nautilus.
My understanding is following (please correct me):
The bug in src/rgw/rgw_http_client.cc .
Since Nautilus HTTP_CODE are converted into ERROR_CODE (200
becomes 0) in the request processing.
This happens in RGWHTTPManager::reqs_thread_entry(), which
centralizes the processing of (curl) HTTP Requests with multi-treading.
This is fine but the member variable http_status of the class
RGWHTTPClient is not updated with the resulting HTTP CODE, so the
variable keeps its initial value of 0.
Then in src/rgw/rgw_crypt.cc the logic is still verifying that
http_status is in range [200,299] and this fails...
I wrote the following oneliner bugfix for
src/rgw/rgw_http_client.cc:
diff --git a/src/rgw/rgw_http_client.cc
b/src/rgw/rgw_http_client.cc
index d0f0baead6..7c115293ad 100644
--- a/src/rgw/rgw_http_client.cc
+++ b/src/rgw/rgw_http_client.cc
@@ -1146,6 +1146,7 @@ void
*RGWHTTPManager::reqs_thread_entry()
status = -EAGAIN;
}
int id = req_data->id;
+ req_data->client->http_status = http_status;
finish_request(req_data, status);
switch (result) {
case CURLE_OK:
The s3cmd is then working fine with KMS server side encryption.
Thanks. This one was also fixed on master in
https://github.com/ceph/ceph/pull/29639 but didn't get backports. I
opened
https://tracker.ceph.com/issues/44443 to track those for mimic
and nautilus.
Questions:
* Could someone please write a fix for the regression of 1) and
make a PR ?
* Could somebody also make a PR for 2?
Thank you for your help. :-)
Cheers
Francois Scheurer
rgw log:
export CLUSTER=ceph; /home/local/ceph/build/bin/radosgw -f
--cluster ${CLUSTER} --name client.rgw.$(hostname) --setuser ceph
--setgroup ceph &
tail -fn0 /var/log/ceph/ceph-client.rgw.ewos1-osd1-stage.log |
less -IS
2020-02-26 16:32:59.208 7fc1f1c54700 20 Getting KMS
encryption key for key=cd0903db-c613-49be-96d9-165c02544bc7
2020-02-26 16:32:59.208 7fc1f1c54700 20 Requesting secret
from barbican
url=http://keystone.service.stage.i.ewcs.ch:5000/v3/auth/tokens
2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug:
RGWHTTPClient::process: http_status: 0
2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug:
RGWHTTP::process
2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug: RGWHTTP::send
2020-02-26 16:32:59.208 7fc1f1c54700 20 sending request to
http://keystone.service.stage.i.ewcs.ch:5000/v3/auth/tokens
2020-02-26 16:32:59.208 7fc1f1c54700 20 ssl verification
is set to off
2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug:
RGWHTTPManager::add_request: client->init_request(req_data): 0
2020-02-26 16:32:59.208 7fc1f1c54700 20 register_request
mgr=0x56374b865540 req_data->id=4, curl_handle=0x56374c77c4a0
2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug:
RGWHTTPManager::signal_thread(): write(thread_pipe[1], (void *)&buf,
sizeof(buf)): 4
2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug:
RGWHTTPManager::add_request: signal_thread(): 0
2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug:
RGWHTTP::send: rgw_http_manager->add_request(req): 0
2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug:
RGWHTTP::process: send(req): 0
2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug: struct
rgw_http_req_data : public RefCountedObject : int wait() : ret: 0
2020-02-26 16:32:59.208 7fc2184a1700 20 link_request
req_data=0x56374c96a240 req_data->id=4, curl_handle=0x56374c77c4a0
2020-02-26 16:32:59.608 7fc2184a1700 20 ewdebug:
RGWHTTPManager::reqs_thread_entry: http_status: 201
2020-02-26 16:32:59.608 7fc2184a1700 20 ewdebug:
RGWHTTPManager::reqs_thread_entry: rgw_http_error_to_errno(http_status): 0
2020-02-26 16:32:59.608 7fc2184a1700 20 ewdebug:
RGWHTTPManager::reqs_thread_entry: finish_request(req_data, status):
status: 0
2020-02-26 16:32:59.608 7fc2184a1700 20 ewdebug: struct
rgw_http_req_data : public RefCountedObject : void finish(int r) : ret: 0
2020-02-26 16:32:59.652 7fc1f1c54700 5 ewdebug:
request_key_from_barbican: Accept application/octet-stream
X-Auth-Token gAAAAABeVo-xxx
2020-02-26 16:32:59.652 7fc1f1c54700 20 ewdebug:
RGWHTTPClient::process: http_status: 0
2020-02-26 16:32:59.652 7fc1f1c54700 20 ewdebug:
RGWHTTP::process
2020-02-26 16:32:59.652 7fc1f1c54700 20 ewdebug: RGWHTTP::send
2020-02-26 16:32:59.652 7fc1f1c54700 20 sending request to
http://barbican.service.stage.i.ewcs.ch:9311/v1/secrets/cd0903db-c613-49be-…
2020-02-26 16:32:59.652 7fc1f1c54700 20 ewdebug:
RGWHTTPManager::add_request: client->init_request(req_data): 0
2020-02-26 16:32:59.652 7fc1f1c54700 20 register_request
mgr=0x56374b865540 req_data->id=5, curl_handle=0x56374c77c4a0
2020-02-26 16:32:59.652 7fc1f1c54700 20 ewdebug:
RGWHTTPManager::signal_thread(): write(thread_pipe[1], (void *)&buf,
sizeof(buf)): 4
2020-02-26 16:32:59.652 7fc1f1c54700 20 ewdebug:
RGWHTTPManager::add_request: signal_thread(): 0
2020-02-26 16:32:59.652 7fc1f1c54700 20 ewdebug:
RGWHTTP::send: rgw_http_manager->add_request(req): 0
2020-02-26 16:32:59.652 7fc1f1c54700 20 ewdebug:
RGWHTTP::process: send(req): 0
2020-02-26 16:32:59.652 7fc1f1c54700 20 ewdebug: struct
rgw_http_req_data : public RefCountedObject : int wait() : ret: 0
2020-02-26 16:32:59.652 7fc2184a1700 20 link_request
req_data=0x56374c96a240 req_data->id=5, curl_handle=0x56374c77c4a0
=> 2020-02-26 16:32:59.752 7fc2184a1700 20 ewdebug:
RGWHTTPManager::reqs_thread_entry: http_status: 200
2020-02-26 16:32:59.752 7fc2184a1700 20 ewdebug:
RGWHTTPManager::reqs_thread_entry: rgw_http_error_to_errno(http_status): 0
2020-02-26 16:32:59.752 7fc2184a1700 20 ewdebug:
RGWHTTPManager::reqs_thread_entry: finish_request(req_data, status):
status: 0
2020-02-26 16:32:59.752 7fc2184a1700 20 ewdebug: struct
rgw_http_req_data : public RefCountedObject : void finish(int r) : ret: 0
2020-02-26 16:32:59.752 7fc1f1c54700 5 ewdebug:
request_key_from_barbican: secret_req.process: 0
=> 2020-02-26 16:32:59.752 7fc1f1c54700 5 ewdebug:
request_key_from_barbican: secret_req.get_http_status: 0
2020-02-26 16:32:59.752 7fc1f1c54700 5 ewdebug:
request_key_from_barbican: secret_req.get_http_status not in [200,299]
range!
2020-02-26 16:32:59.752 7fc1f1c54700 5 Failed to retrieve
secret from barbican:cd0903db-c613-49be-96d9-165c02544bc7
2020-02-26 16:32:59.752 7fc1f1c54700 5 ERROR: failed to
retrieve actual key from key_id: cd0903db-c613-49be-96d9-165c02544bc7
2020-02-26 16:32:59.752 7fc1f1c54700 2 req 1 1.092s
s3:put_obj completing
2020-02-26 16:32:59.752 7fc1f1c54700 2 req 1 1.092s
s3:put_obj op status=-13
2020-02-26 16:32:59.752 7fc1f1c54700 2 req 1 1.092s
s3:put_obj http status=403
2020-02-26 16:32:59.752 7fc1f1c54700 1 ====== req done
req=0x56374c9808d0 op status=-13 http_status=403 latency=1.092s ======
=> we see that http_status is correct (200) but the variable
secret_req.get_http_status (member of class RGWHTTPClient) is
incorrect (0 instead of 200)