On Fri, Jun 19, 2020 at 12:19 AM Yuri Weinstein <yweinste(a)redhat.com> wrote:
Details of this release summarized here:
https://tracker.ceph.com/issues/46039#note-2
rados - FAILED approved Neha?
rgw - FAILED approved Casey?
rbd - FAILED approved Jason?
krbd - FAILED approved Jason, Ilya?
xfstests with msgr-failures/many.yaml timed out because some OSDs
crashed on out of order ops:
src/osd/PrimaryLogPG.cc: 4050: ceph_abort_msg("out of order op")
I looked at one of the OSDs and this appears to be a failure
injection corner case. The kernel attempted to resend the op in
question over a thousand times, with either the request message or
the reply message not making it due to session resets and eventually
the PG log got trimmed in PGLog::IndexedLog::trim():
osd.1 ... do_op osd_op(client.4551.0:679461 ... RETRY=1217
osd.1 ... do_op dup client.4551.0:679461
trim ... modify ... by client.4551.0:679461
osd.1 ... do_op osd_op(client.4551.0:679461 ... RETRY=1218
osd.1 ... bad op order, already applied 680251 > this 679461
Neha, Josh, let me know if I'm off the rails here ;)
Approved.
Thanks,
Ilya