On 29-4-2020 09:59, Willem Jan Withagen wrote:

On 29-4-2020 03:46, kefu chai wrote:

On Fri, Apr 24, 2020 at 6:17 PM Willem Jan Withagen <wjw@digiware.nl> wrote:

On 20-4-2020 18:07, kefu chai wrote:

Le lun. 20 avr. 2020 à 19:53, Willem Jan Withagen <wjw@digiware.nl> a écrit :

On 20-4-2020 13:26, kefu chai wrote:

On Sun, Apr 19, 2020 at 7:00 PM Willem Jan Withagen <wjw@digiware.nl> wrote:

Hi Kefu,

This looks like a possible not correctly initialised difference?
Am I correct in assuming that?

Or suggestions to debug this?

i think you already found the PR addressing this issue and filed
https://tracker.ceph.com/issues/45130?

anything i am missing?

That PR was about the check-generated script not being able to set
the return result in case of failure. Due to Bash creating a subshell
for the while-loop, and thus putting counting variables in a different
scope. Which you fixed in this PR.

Once Fixed, I'm getting errors reported when running the script for testing
RGWObjManifest
and later on
bluestore_bdev_label_t

So for these 2 cases `dump_json` and `encode decode dump_json` give
different results.
I very much suspect that it could be that there is a difference bewween
initializing an object
and decoding an object in the way some fields are handled

But I haven't found that (yet).

I see. Willem, can you see the same issue on master or octopus?

Hi Kefu,

So fixing the bluestore_bdev_label_t error only requires backporting # 29968
Fixing the error with RGWObjManifest is done in #29862, but requires quite some
more backports for all fields of RGWObjManifest and children to actually get it fixed.

So I submitted a tracker to backport 29968
Getting #29862 to patch in Nautilus will need quite some fixing, and thus require
a specific patch on Nautilus. And then still it'll require quite some more backports.

So for "fixing" the RGWObjManifes, I'm currently running my FreeBSD tests with a patch
that fixes the testing loop like in #29862, but then excludes this test in Nautilus.

If that is acceptable for a patch on Nautilus, I'll submit that.

hi Willem, thanks for the investigations. nautilus is not EOL, so a
patch is always acceptable i think. but "quite some fixing" and "quite
some backports" are kind of worrying me, what do you mean by "quite
some", are they involving tremendous work for preparing the fix only
for addressing the test failure or they are indeed bug fixing which
address issues we could be facing in production?

To start with the last point: No, I do not expect that there is any impact on production.
So we could also try to ignore just these tests.

#29968 is a nobrainer to backport.
#29862 required extra fixes. (quite some might be overstated)
Since a pactch might be acceptable, I'll put some efforts in it.

Can I create ONE PR that holds several cherry-picks, and some custom commits?
Otherwise I'll just use #29862 as basis to create a new PR.

I think I'm (way) in over my head here, since this PR 29862 depends of files created during
a large (166 files) rebase of Manifestfiles.
PR #29118 creates the file src/rgw/rgw_obj_manifest.h, which is the modified a few more
times before it gets hit by #29862.

And it is very hard to guestimate how far this will have impact on other part of the code.

So I'm calling it quits on this problem, and will live with the fact that check_generated.sh
is flawed, and not reporting any errors.

Quick and dirty fix could be:
- fixing the script to report errors
- skip the tests that are in error, so we still get a warning if any of the other object-types
starts getting errors.

--WjW

--WjW