Il giorno gio 4 feb 2021 alle ore 00:33 Simon Ironside <
sironside(a)caffetine.org> ha scritto:
On 03/02/2021 19:48, Mario Giammarco wrote:
To labour Dan's point a bit further, maybe a RAID5/6 analogy is better
than RAID1. Yes, I know we're not talking erasure coding pools here but
this is similar to the reasons why people moved from RAID5 (size=2, kind
of) to RAID6 (size=3, kind of). I.e. the more disks you have in an array
(cluster, in our case) and the bigger those disks are, the greater the
chance you have of encountering a second problem during a recovery.
Yes I know the motivations for raid6 but to simplify the use case I am
comparing
ceph size=2 to raid1.
What I ask is
this: what happens with min_size=1 and split brain,
network down or similar things: do ceph block writes because it has no
quorum on monitors? Are there some failure scenarios that I have not
considered?
It sounds like in your example you would have 3 physical servers in
total. So would you have both a monitor and OSDs processes on each server?
Yes sorry if it was not clear:
- three servers
- three monitors
- three managers
- 6 osd (two disks per server)
If so, it's not really related to min_size=1 but
to answer your question
you could lose one monitor and the cluster would continue. Losing a
second monitor will stop your cluster until this is resolved. In your
example setup (with colocated mons & OSDs) this would presumably also
mean you'd lost two OSDs servers too so you'd have bigger problems.
Losing the switch means monitors are up but cannot communicate so they
should stop?