Hi Frank
Thanks for the reply.
I think this happens when a PG has 3 different copies
and cannot decide which one is correct. You might have hit a very rare case. You should
start with the scrub errors, check which PGs and which copies (OSDs) are affected. It
sounds almost like all 3 scrub errors are on the same PG.
Yes, all 3 errors are for
the same PG and on the same OSD:
2020-11-01 18:25:09.333339 osd.0 [ERR] 3.b shard 2 soid
3:d577e975:::1000023675e.00000000:head : candidate had a missing snapset key, candidate
had a missing info key
2020-11-01 18:25:09.333342 osd.0 [ERR] 3.b soid 3:d577e975:::1000023675e.00000000:head :
failed to pick suitable object info
2020-11-01 18:26:33.496255 osd.0 [ERR] 3.b repair 3 errors, 0 fixed
You might have had a combination of crash and OSD
fail, your situation is probably not covered by "single point of failure".
Yes it was a complex crash, all went down.
In case you have a PG with scrub errors on 2 copies,
you should be able to reconstruct the PG from the third with PG export/PG import commands.
I have not done a PG export/import before. Mind if you could send the instructions
or a link for it.
Thanks
Sagara