We are Ceph Luminous 12.2.13-0 in one of our clusters and we are observing OSDs flapping.
The stack trace we have from the OSDs that went down is given below :
/var/log/ceph/ceph-osd.27.log:/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge
/release/12.2.12/rpm/el7/BUILD/ceph-12.2.12/src/osd/ECTransaction.h: 179: FAILED
assert(plan.to_read.count(i.first) == 0 || (!plan.to_read.at(i.first).empty() &&
!i
.second.has_source()))
We have seen a similar bug raised at
https://tracker.ceph.com/issues/21756
But the fix has been given in nautilus version
(
https://github.com/ceph/ceph/pull/18241/commits/fb50f43244f0a9bc59f9aa4e231…)
Please let us know if we can backport the same fix in Luminous or is there something else
we can do to fix the issue.
Thanks!