Sorry, I misunderstood the comment. Are there any known workarounds? This
seems like a serious index corruption, and it prevents us from using Ceph
with Singlestore, so any suggestions would be appreciated!
Best.
-Joseph Victor
On Wed, Jun 9, 2021 at 3:38 PM Yehuda Sadeh-Weinraub <yehuda(a)redhat.com>
wrote:
I'm not sure configuring the grace period would
help, as there's a bug
there.
On Wed, Jun 9, 2021 at 2:57 PM Joseph Victor <joseph(a)singlestore.com>
wrote:
> Thanks for the response! This issue seems like precisely the issue we
> saw...
>
> Can the grace period be configured? Our logging suggests the PUT and
> list happen within the same millisecond.
>
> Best,
> -Joseph Victor
>
> On Wed, Jun 9, 2021 at 2:50 PM Yehuda Sadeh-Weinraub <yehuda(a)redhat.com>
> wrote:
>
>>
>>
>> On Tue, Jun 8, 2021 at 7:13 PM Joseph Victor <joseph(a)singlestore.com>
>> wrote:
>>
>>> Hey all, we were doing some testing of ceph against our product and we
>>> found some behavior we want to run by you.
>>>
>>> We are using the S3 ceph interface.
>>> Attached is a python file using boto3 which, when run against two
>>> different deployments of ceph (octopus ceph nano and our production
>>> nautilus 14.2.11 deployment), appears to repro a strange issue.
>>> After running for a while, a recently uploaded file forever
>>> disappears from list_objects requests. This file still appears to be
>>> visible to get_object if you know the specific name, but does not show up
>>> in list_objects.
>>> There are more details about the experiment in the attached python file.
>>>
>>> We produced a run of this experiment with debug logging, in which we
>>> see a trace message
>>>
>>> RGWRados::cls_bucket_list_ordered: skipping <filename>
>>>
>>> In the same millisecond that the file was PUT.
>>>
>>> Reading the code, this comes from when a call to check_disk_state
>>> returns ENOENT, where we see
>>>
>>> if (!list_state.is_delete_marker() && !astate->exists) {
>>> /* object doesn't exist right now -- hopefully because it's
>>> * marked as !exists and got deleted */
>>> if (list_state.exists) {
>>> /* FIXME: what should happen now? Work out if there are any
>>> * non-bad ways this could happen (there probably are, but
>>> annoying
>>> * to handle!) */
>>> }
>>> // encode a suggested removal of that key
>>> list_state.ver.epoch = io_ctx.get_last_version();
>>> list_state.ver.pool = io_ctx.get_id();
>>> cls_rgw_encode_suggestion(CEPH_RGW_REMOVE, list_state,
>>> suggested_updates);
>>> return -ENOENT;
>>> }
>>>
>>> It seems like this might be some kind of race between PUT and
>>> list_object in which some kind of object metadata is apparently deleted...
>>> the FIXME is at least a little suspicious :).
>>>
>>> I would love to know what's going on here, and if there is a fix or
>>> workaround we can do to prevent this behavior. Let me know if there is any
>>> other information we can provide.
>>>
>>>
>>
>> This FIXME probably exists there since the dawn of time. The code here
>> identifies that a listed object doesn't exist and sends a suggestion to the
>> index objclass to remove it. However, there should be a long grace period
>> so that recently created object shouldn't be removed by the index (should
>> be handled at src/cls/rgw/cls_rgw.cc iirc). It does sound like a bug that
>> we had seen before, see here:
>>
https://tracker.ceph.com/issues/24744
>> ... which I now see is still open. I'm not sure that the fix there
>> doesn't cause other issues.
>>
>> Yehuda
>>
>>
>>> Thank you so much!
>>>
>>> Best,
>>> -Joseph Victor
>>> _______________________________________________
>>> Dev mailing list -- dev(a)ceph.io
>>> To unsubscribe send an email to dev-leave(a)ceph.io
>>>
>>