Hey all, we were doing some testing of ceph against our product and we found some behavior we want to run by you.  

We are using the S3 ceph interface.  
Attached is a python file using boto3 which, when run against two different deployments of ceph (octopus ceph nano and our production nautilus 14.2.11 deployment), appears to repro a strange issue.
After running for a while, a recently uploaded file forever disappears from list_objects requests.  This file still appears to be visible to get_object if you know the specific name, but does not show up in list_objects.
There are more details about the experiment in the attached python file.

We produced a run of this experiment with debug logging, in which we see a trace message

RGWRados::cls_bucket_list_ordered: skipping <filename>

In the same millisecond that the file was PUT.  

Reading the code, this comes from when a call to check_disk_state returns ENOENT, where we see

  if (!list_state.is_delete_marker() && !astate->exists) {
      /* object doesn't exist right now -- hopefully because it's
       * marked as !exists and got deleted */
    if (list_state.exists) {
      /* FIXME: what should happen now? Work out if there are any
       * non-bad ways this could happen (there probably are, but annoying
       * to handle!) */
    }
    // encode a suggested removal of that key
    list_state.ver.epoch = io_ctx.get_last_version();
    list_state.ver.pool = io_ctx.get_id();
    cls_rgw_encode_suggestion(CEPH_RGW_REMOVE, list_state, suggested_updates);
    return -ENOENT;
  }

It seems like this might be some kind of race between PUT and list_object in which some kind of object metadata is apparently deleted... the FIXME is at least a little suspicious :).  

I would love to know what's going on here, and if there is a fix or workaround we can do to prevent this behavior.  Let me know if there is any other information we can provide.

Thank you so much!

Best,
-Joseph Victor