A bug was reported recently where if a put object occurs when bucket resharding is
finishing up, it would write to the old bucket shard rather than the new one. From your
logs there is evidence that resharding is underway alongside the put object.
A fix for that bug is on main and pacific, and the quincy version is not yet merged. See:
Octopus was EOLed back in August so won’t receive the fix. But it seems the next releases
pacific and quincy will have the fix as will reef.
Eric
(he/him)
On Feb 13, 2023, at 11:41 AM, mahnoosh shahidi
<mahnooosh.shd(a)gmail.com> wrote:
Hi all,
We have a cluster on 15.2.12. We are experiencing an unusual scenario in
S3. User send PUT request to upload an object and RGW returns 200 as a
response status code. The object has been uploaded and can be downloaded
but it does not exist in the bucket list. We also tried to get the bucket
index entry for that object but it does not exist. Below is the log of the
RGW for the request.
1 ====== starting new request req=0x7f246c4426b0 =====
2 req 44161 0s initializing for trans_id =
tx00000000000000000ac81-0063e36653-17e18f0-default
10 rgw api priority: s3=3 s3website=2
10 host=192.168.0.201
10 meta>> HTTP_X_AMZ_CONTENT_SHA256
10 meta>> HTTP_X_AMZ_DATE
10 x>>
x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
10 x>> x-amz-date:20230208T090731Z
10 handler=22RGWHandler_REST_Obj_S3
2 req 44161 0s getting op 1
10 req 44161 0s s3:put_obj scheduling with dmclock client=2 cost=1
10 op=21RGWPutObj_ObjStore_S3
2 req 44161 0s s3:put_obj verifying requester
10 v4 signature format =
7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745
10 v4 credential format =
85ZYESW8HS34DC95MZBT/20230208/us-east-1/s3/aws4_request
10 access key id = 85ZYESW8HS34DC95MZBT
10 credential scope = 20230208/us-east-1/s3/aws4_request
10 req 44161 0s canonical headers format =
content-md5:ttgbNgpWctgMJ0MPORU+LA==
host:192.168.0.201
x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
x-amz-date:20230208T090731Z
10 payload request hash =
30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
10 canonical request = PUT
/test7/file508294
content-md5:ttgbNgpWctgMJ0MPORU+LA==
host:192.168.0.201
x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
x-amz-date:20230208T090731Z
content-md5;host;x-amz-content-sha256;x-amz-date
30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
10 canonical request hash =
2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba
10 string to sign = AWS4-HMAC-SHA256
20230208T090731Z
20230208/us-east-1/s3/aws4_request
2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba
10 req 44161 0s delaying v4 auth
10 date_k =
a9dc6afa32600995d313f1b6a4fa40be3a3cd574d25db8789ac966a8e7f43356
10 region_k =
b9193e8e261f702b88549da7e81e6a4a7672725996ea8a86269fed665b39670d
10 service_k =
34214c91aec1192bcc413e02044e346b31ed4f13df8c15830bdb1d7bd3565126
10 signing_k =
7656d62334d92c982f8c21e0200e760054b214eebab6dbeab577fb655c00a6f4
10 generated signature =
7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745
2 req 44161 0s s3:put_obj normalizing buckets and tenants
10 s->object=file508294 s->bucket=test7
2 req 44161 0s s3:put_obj init permissions
10 cache get: name=default.rgw.meta+root+test7 : expiry miss
10 cache put: name=default.rgw.meta+root+test7 info.flags=0x16
10 adding default.rgw.meta+root+test7 to cache LRU end
10 updating xattr: name=ceph.objclass.version bl.length()=42
10 cache get: name=default.rgw.meta+root+test7 : type miss
(requested=0x11, cached=0x16)
10 cache put: name=default.rgw.meta+root+test7 info.flags=0x11
10 moving default.rgw.meta+root+test7 to cache LRU end
10 cache get: name=default.rgw.meta+users.uid+storage : hit
(requested=0x6, cached=0x17)
10 cache get: name=default.rgw.meta+users.uid+storage : hit
(requested=0x3, cached=0x17)
2 req 44161 0.003999945s s3:put_obj recalculating target
2 req 44161 0.003999945s s3:put_obj reading permissions
2 req 44161 0.003999945s s3:put_obj init op
2 req 44161 0.003999945s s3:put_obj verifying op mask
2 req 44161 0.003999945s s3:put_obj verifying op permissions
5 req 44161 0.003999945s s3:put_obj Searching permissions for
identity=rgw::auth::SysReqApplier ->
rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=,
perm_mask=15, is_admin=0) mask=50
5 Searching permissions for uid=storage
5 Found permission: 15
5 Searching permissions for group=1 mask=50
5 Permissions for group not found
5 Searching permissions for group=2 mask=50
5 Permissions for group not found
5 req 44161 0.003999945s s3:put_obj -- Getting permissions done for
identity=rgw::auth::SysReqApplier ->
rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=,
perm_mask=15, is_admin=0), owner=storage, perm=2
37:31.066+0330 7f2479c5e700 10 req 44161 0.003999945s s3:put_obj
identity=rgw::auth::SysReqApplier ->
rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=,
perm_mask=15, is_admin=0) requested perm (type)=2, policy perm=2,
user_perm_mask=2, acl perm=2
2 req 44161 0.003999945s s3:put_obj verifying op params
2 req 44161 0.003999945s s3:put_obj pre-executing
2 req 44161 0.003999945s s3:put_obj executing
5 req 44161 0.023999668s s3:put_obj NOTICE: call to
do_aws4_auth_completion
10 req 44161 0.023999668s s3:put_obj v4 auth ok -- do_aws4_auth_completion
5 req 44161 0.023999668s s3:put_obj NOTICE: call to
do_aws4_auth_completion
--
0 RGWReshardLock::lock failed to acquire lock on
test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 ret=-16
0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
try again
0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
try again
0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
try again
0 RGWReshardLock::lock failed to acquire lock on
test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 ret=-16
10 cache get: name=default.rgw.meta+root+test7 : hit (requested=0x6,
cached=0x17)
10 cache get: name=default.rgw.meta+root+test7 : hit (requested=0x1,
cached=0x17)
-1 WARNING: The bucket info cache is inconsistent. This is a failure that
should be debugged. I am a nice machine, so I will try to recover.
10 cache get:
name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
: hit (requested=0x16, cached=0x17)
10 cache get:
name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
: hit (requested=0x13, cached=0x17)
10 cache put:
name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
info.flags=0x13
10 moving
default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
to cache LRU end
10 updating xattr: name=ceph.objclass.version bl.length()=42
10 updating xattr: name=user.rgw.acl bl.length()=147
10 chain_cache_entry:
cache_locator=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
-1 WARNING: The OSD has the same version I have. Something may have gone
squirrelly. An administrator may have forced a change; otherwise there is a
problem somewhere.
0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
try again
0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
try again
10 manifest: total_size = 1048576
10 setting object
write_tag=c3b354de-c79f-444b-a647-5b272f8148d7.25041136.44161
10 cache get: name=default.rgw.log++bucket.sync-source-hints.test7 : hit
(negative entry)
10 cache get: name=default.rgw.log++bucket.sync-target-hints.test7 : hit
(negative entry)
10 chain_cache_entry: cache_locator=
10 cache get:
name=default.rgw.log++pubsub.user.storage.bucket.test7/c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
: hit (negative entry)
2 req 44161 363.150981592s s3:put_obj completing
4 write_data failed: Connection reset by peer
0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Connection reset
by peer
2 req 44161 363.150981592s s3:put_obj op status=0
2 req 44161 363.150981592s s3:put_obj http status=200
1 ====== req done req=0x7f246c4426b0 op status=0 http_status=200
latency=363.150981592s ======
Anybody has any idea about the reason for this behaviour?
Best Regards,
Mahnoosh
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io