RadosGW multipart fragments not being cleaned up by lifecycle policy on Quincy - ceph-users

1 Mar 2023

The latest version of quincy seems to be having problems cleaning up multipart fragments
from canceled uploads. 

The bucket is empty:

% s3cmd -c .s3cfg ls s3://warp-benchmark
%

However, it's got 11TB of data and 700k objects.

# radosgw-admin bucket stats --bucket=warp-benchmark
{
    "bucket": "warp-benchmark",
    "num_shards": 10,
    "tenant": "",
    "zonegroup": "6be863e8-a9f2-42c9-b114-c8651b1f1afa",
    "placement_rule": "ssd.ec63",
    "explicit_placement": {
        "data_pool": "",
        "data_extra_pool": "",
        "index_pool": ""
    },
    "id": "aa099b5e-01d5-4394-b287-df99a4d63298.18924.1",
    "marker": "aa099b5e-01d5-4394-b287-df99a4d63298.37403.1",
    "index_type": "Normal",
    "owner": "warp_benchmark",
    "ver":
"0#5580404,1#5593184,2#5586262,3#5591427,4#5591937,5#5588120,6#5589760,7#5582923,8#5579062,9#5578699",
    "master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0",
    "mtime": "0.000000",
    "creation_time": "2023-02-10T21:45:12.721604Z",
    "max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#",
    "usage": {
        "rgw.main": {
            "size": 12047620866048,
            "size_actual": 12047620866048,
            "size_utilized": 12047620866048,
            "size_kb": 11765254752,
            "size_kb_actual": 11765254752,
            "size_kb_utilized": 11765254752,
            "num_objects": 736113
        }
    },
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    }
}

A bucket list shows that they are all multipart fragments

# radosgw-admin bucket list --bucket=warp-benchmark
[
... (LOTS OF THESE)
{
        "name":
"_multipart_(2F3(gCS/1.GagoUCrCRqawswb6.rnd.tg1efLm7-es41Xg3i-Nm6bYjS-c-No79.12",
        "instance": "",
        "ver": {
            "pool": 20,
            "epoch": 30984
        },
        "locator": "",
        "exists": "true",
        "meta": {
            "category": 1,
            "size": 16777216,
            "mtime": "2023-02-16T00:03:01.586472Z",
            "etag": "e7475bca6a58de35648ca5f25d6653bf",
            "storage_class": "",
            "owner": "warp_benchmark",
            "owner_display_name": "Warp Benchmark",
            "content_type": "",
            "accounted_size": 16777216,
            "user_data": "",
            "appendable": "false"
        },
        "tag": "_YdopX7yxnVrvg2h35MIQGN3vsPyZx5W",
        "flags": 0,
        "pending_map": [],
        "versioned_epoch": 0
    }
]

Note that the timestamp is from 2 weeks ago so a lifecycle policy of "cleanup after 1
day" should delete them.

cat cleanup-multipart.xml
<LifecycleConfiguration>
    <Rule>
        <ID>abort-multipart-rule</ID>
        <Filter>
            <Prefix></Prefix>
        </Filter>
        <Status>Enabled</Status>
        <AbortIncompleteMultipartUpload>
          <DaysAfterInitiation>1</DaysAfterInitiation>
        </AbortIncompleteMultipartUpload>
    </Rule>
</LifecycleConfiguration>

% s3cmd dellifecycle s3://warp-benchmark
s3://warp-benchmark/: Lifecycle Policy deleted
% s3cmd setlifecycle cleanup-multipart.xml s3://warp-benchmark
s3://warp-benchmark/: Lifecycle Policy updated

A secondary problem is that the lifecycle policy never runs automatically and is stuck in
the UNINITIAL state. This problem is for another day of debugging.

# radosgw-admin lc list
[
    {
        "bucket":
":warp-benchmark:aa099b5e-01d5-4394-b287-df99a4d63298.37403.1",
        "started": "Thu, 01 Jan 1970 00:00:00 GMT",
        "status": "UNINITIAL"
    }
]

However, it can be started manually

# radosgw-admin lc process
# radosgw-admin lc list
[
    {
        "bucket":
":warp-benchmark:aa099b5e-01d5-4394-b287-df99a4d63298.37403.1",
        "started": "Wed, 01 Mar 2023 17:35:27 GMT",
        "status": "COMPLETE"
    }
]

This has no effect on the bucket and the bucket stats show the exact same size and object
count (output omitted for brevity).

Running a gc pass also has no effect

# radosgw-admin gc list         
[]
# radosgw-admin gc process
# radosgw-admin gc list
[]

Any ideas?