Oh no, it's not that bad. It's

$ ping -s 65000 dest.inati.on

on a VPN connection that has a MTU of 1300 via IPv6. So I suspect that I only get an answer, when all 51 fragments get fully returned. It's clear that big packets with lots of fragments are more affected by packet loss than 64 byte pings.

I just (at 9 o'clock in the morning) repeated this ping test and got hardly any drops (less than 1%), even with the size of 64k. So it's really dependent on the time of the day. Seems like some ISPs are dropping some packets, especially in the evening...

A few minutes ago I restarted all down-marked OSDs, but they are getting marked down again... Seems like Ceph is tolerable against packet loss (it surely affects performance, but this irrelevant for me).


Could erasure coded pools pose some problems?


Thank you all for every hint!

Lorenz


Am 15.08.19 um 08:51 schrieb Janne Johansson:
Den ons 14 aug. 2019 kl 17:46 skrev Lorenz Kiefner <root+cephusers@deinadmin.de>:
Is ceph sensitive to packet loss? On some VPN links I have up to 20%
packet loss on 64k packets but less than 3% on 5k packets in the evenings.

20% seems crazy high, there must be something really wrong there.

At 20%, you would get tons of packet timeouts to wait for on all those lost frames,
then resends of (at least!) those 20% extra, which in turn would lead to 20% of those
resends getting lost, all while the main streams of data try to move forward when some
older packet do get over. This is a really bad situation to design for, 

I think you should look for a link solution that doesn't drop that many packets, instead of changing
the software you try to run over that link, all others will notice this too and act badly in some way or other.

Heck, 20% is like taking a math schoolbook and remove all instances of "3" and "8" and see if kids can learn to count from it. 8-/
 
--
May the most significant bit of your life be positive.