Thank you for the information, Christian. When you reshard the bucket id is updated (with
most recent versions of ceph, a generation number is incremented). The first bucket id
matches the bucket marker, but after the first reshard they diverge.
The bucket id is in the names of the currently used bucket index shards. You’re searching
for the marker, which means you’re finding older bucket index shards.
Change your commands to these:
# rados -p raum.rgw.buckets.index ls \
|grep 3caabb9a-4e3b-4b8a-8222-34c33dd63210.10648356.1 \
|sort -V
# rados -p raum.rgw.buckets.index ls \
|grep 3caabb9a-4e3b-4b8a-8222-34c33dd63210.10648356.1 \
|sort -V \
|xargs -IOMAP sh -c \
'rados -p raum.rgw.buckets.index listomapkeys OMAP | wc -l'
When you refer to the “second zone”, what do you mean? Is this cluster using multisite? If
and only if your answer is “no”, then it’s safe to remove old bucket index shards.
Depending on the version of ceph running when reshard was run, they were either
intentionally left behind (earlier behavior) or removed automatically (later behavior).
Eric
(he/him)
On Jul 25, 2023, at 6:32 AM, Christian Kugler
<syphdias+ceph(a)gmail.com> wrote:
Hi Eric,
1. I recommend that you *not* issue another
bucket reshard until you figure out what’s going on.
Thanks, noted!
2. Which version of Ceph are you using?
17.2.5
I wanted to get the Cluster to Health OK before upgrading. I didn't
see anything that led me to believe that an upgrade could fix the
reshard issue.
3. Can you issue a `radosgw-admin metadata get
bucket:<bucket-name>` so we can verify what the current marker is?
# radosgw-admin metadata get bucket:sql20
{
"key": "bucket:sql20",
"ver": {
"tag": "_hGhtgzjcWY9rO9JP7YlWzt8",
"ver": 3
},
"mtime": "2023-07-12T15:56:55.226784Z",
"data": {
"bucket": {
"name": "sql20",
"marker":
"3caabb9a-4e3b-4b8a-8222-34c33dd63210.10610190.9",
"bucket_id":
"3caabb9a-4e3b-4b8a-8222-34c33dd63210.10648356.1",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"owner": "S3user",
"creation_time": "2023-04-26T09:22:01.681646Z",
"linked": "true",
"has_bucket_info": "false"
}
}
4. After you resharded previously, did you get
command-line output along the lines of:
2023-07-24T13:33:50.867-0400 7f10359f2a80 1 execute INFO: reshard of bucket
“<bucket-name>" completed successfully
I think so, at least for the second reshard. But I wouldn't bet my
life on it. I fear I might have missed an error on the first one since
I have done a radosgw-admin bucket reshard so often and never seen it
fail.
Christian