# radosgw-admin bucket check --check-objects --fix --bucket sgbackup1

Setting bucket shards to a lower amount doesn’t change anything. After running this command than a bucket check, the orphaned data remains.

[root@os1-sin1 ~]# radosgw-admin reshard add --bucket vgood-test --num-shards 7 --yes-i-really-mean-it

[root@os1-sin1 ~]# radosgw-admin reshard list

[

{

"time": "2020-09-24T17:14:42.189517Z",

"tenant": "",

"bucket_name": "vgood-test",

"bucket_id": "d8c6ebd1-2bab-414d-9d6b-73bf9bc8fc5a.12045805.1",

"new_instance_id": "",

"old_num_shards": 11,

"new_num_shards": 7

}

]

However setting bucket shards to 0 then running the bucket check command removed the orphan data.

[root@os1-sin1 ~]# radosgw-admin reshard add --bucket vgood-test --num-shards 0 --yes-i-really-mean-it

[root@os1-sin1 ~]# radosgw-admin reshard list

[

{

"time": "2020-09-24T17:23:34.843021Z",

"tenant": "",

"bucket_name": "vgood-test",

"bucket_id": "d8c6ebd1-2bab-414d-9d6b-73bf9bc8fc5a.14335315.1",

"new_instance_id": "",

"old_num_shards": 7,

"new_num_shards": 0

}

]

[root@os1-sin1 ~]# radosgw-admin reshard process

2020-09-24T13:23:50.895-0400 7f24a0e47200 1 execute INFO: reshard of bucket "vgood-test" from "vgood-test:d8c6ebd1-2bab-414d-9d6b-73bf9bc8fc5a.14335315.1" to "vgood-test:d8c6ebd1-2bab-414d-9d6b-73bf9bc8fc5a.14335720.1" completed successfully

Is there any word on where this behavior might be originating from?

I updated the ticket with this additional info, and would be glad to contribute any resources we can offer to help introduce a patch. We’re facing the same issues as in the ticket, running these clean up commands might be feasible for smaller buckets but is very unwieldy for the size clusters we’re running and are also losing a few terabytes of capacity.

- Gavin

On Mar 26, 2021, at 10:26 AM, Casey Bodley <cbodley@redhat.com> wrote:

On Thu, Mar 25, 2021 at 10:21 AM Gavin Chen <gchen@linode.com> wrote:

Hi all,

We’re running into what seems to be a reoccurring bug on RGW’s when handling multipart uploads. The RGW’s seem to orphaning upload parts which then take up space and can be difficult to find as they do not show up using client side s3 tools. RGW shows the pieces from the multipart upload are essentially orphaned and still stored in the cluster even after the upload has finished and the piecemeal object recombined. We’re currently running Octopus version 2.8 and are able to reliably reproduce the bug.

Interestingly, when looking at the cluster through s3cmd or boto it’s showing the correct bucket usage with just the successful multipart object in the bucket, and the smaller shards from the upload not appearing. The bug seems like it’s related to bucket index sharding, as the bug is fixable by setting the shards to 0 and running a bucket check command. But running the command for buckets with sharding enabled doesn’t do anything and the orphans remain in the cluster.

Looking at the issues backlog it seems like this was a problem even in much earlier releases dating all the way back to Hammer. https://tracker.ceph.com/issues/16767

We can confirm that this bug is still persisting even in Octopus and Nautilus. A current manual workaround is to reset bucket sharding to 0 and run a bucket check command. However this is ineffective since one would need to know which bucket is affected (which can only be done through RGW since the s3 tools don’t show the orphaned pieces), and bucket sharding would need to set to 0 for the fix to happen.

Has anyone else come across this bug? The comments from the issue ticket show it’s been a consistent problem through the years but with unfortunately no movement. The bug was assigned 3 years ago but looks like a fix was unable to be implemented.

- Gavin
_______________________________________________
Dev mailing list -- dev@ceph.io
To unsubscribe send an email to dev-leave@ceph.io

thanks for the link. we've been tracking this one in
https://tracker.ceph.com/issues/44660 and are still working on it