yep, my fault I meant replication = 3 ....
but aren't
PGs checksummed so from the remaining PG (given the
checksum would be right) two new copies could be created?
Assuming again 3R on 5 nodes, failure domain of host, if 2 nodes go down, there will be
1/3 copies available. Normally a 3R pool has min_size set to 2.
You can set min_size to 1 temporarily, then those PGs will become active and copies will
be created to restore redundancy, but if that remaining OSD is damaged, if there’s a DIMM
flake, a cosmic ray, if the wrong OSD crashes or restarts at the wrong time, you can find
yourself without the most recent copy of data and be unable to recover. It’s Russian
Roulette.
I see, but wouldn't ceph try to recreate redundancy by it's own
(unless I'm explicitly tell it not to do so)?
And if the I/O and load on the cluster isn't too high disk speed good
net connectivity good it would recover fairly quickly into healthy
redundancy state?
Anyhow, I'm not planing on crashing two nodes ;-) I just wanted to get
a feeling of how much more secure/robust
a setup with five nodes compared to four nodes is.