[ceph-users] Re: Worst thing that can happen if I have size= 2

3 Feb 2021

On 03/02/2021 19:48, Mario Giammarco wrote:
...
  It is obvious and a bit paranoid because many servers
on many customers 
 run on raid1 and so you are saying: yeah you have two copies of the data 
 but you can broke both. Consider that in ceph recovery is automatic, 
 with raid1 some one must manually go to the customer and change disks. 
 So ceph is already an improvement in this case even with size=2. With 
 size 3 and min 2 it is a bigger improvement I know. 
To labour Dan's point a bit further, maybe a RAID5/6 analogy is better 
than RAID1. Yes, I know we're not talking erasure coding pools here but 
this is similar to the reasons why people moved from RAID5 (size=2, kind 
of) to RAID6 (size=3, kind of). I.e. the more disks you have in an array 
(cluster, in our case) and the bigger those disks are, the greater the 
chance you have of encountering a second problem during a recovery.

...
  What I ask is this: what happens with min_size=1 and
split brain, 
 network down or similar things: do ceph block writes because it has no 
 quorum on monitors? Are there some failure scenarios that I have not 
 considered? 
It sounds like in your example you would have 3 physical servers in 
total. So would you have both a monitor and OSDs processes on each server?

If so, it's not really related to min_size=1 but to answer your question 
you could lose one monitor and the cluster would continue. Losing a 
second monitor will stop your cluster until this is resolved. In your 
example setup (with colocated mons & OSDs) this would presumably also 
mean you'd lost two OSDs servers too so you'd have bigger problems.

HTH,
Simon

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Worst thing that can happen if I have size= 2