Ditto, I had a bad optic on 48x10 switch. The only way I detected it was my prometheus
tcp fail retrans count. Looking back over the previous 4 weeks, I could seen it increment
in small bursts, but Ceph was able to handle it.... and then it went crazy and a bunch of
OSD’s just dropped out.