Il giorno gio 4 feb 2021 alle ore 12:19 Eneko Lacunza <elacunza(a)binovo.es>
ha scritto:
Hi all,
El 4/2/21 a las 11:56, Frank Schilder escribió:
- three
servers
- three monitors
- 6 osd (two per server)
- size=3 and min_size=2
This is a set-up that I would not run at all. The first
one is, that
ceph lives on the law of large numbers and 6 is a small number. Hence,
your
OSD fill-up due to uneven distribution.
What comes to my mind is a hyper-converged server with 6+ disks in a
RAID10 array,
possibly with a good controller with battery-powered or other
non-volatile cache. Ceph will never beat that performance. Put in some
extra disks as hot-spare and you have close to self-healing storage.
Such a small ceph cluster will inherit all the baddies of ceph
(performance,
maintenance) without giving any of the goodies (scale-out,
self-healing, proper distributed raid protection). Ceph needs size to
become well-performing and pay off the maintenance and architectural effort.
It's funny that we have multiple clusters similar to this, and we and
our customers couldn't be happier. Just use a HCI solution (like for
example Proxmox VE, but there are others) to manage everything.
Maybe the weakest thing in that configuration is
having 2 OSDs per node;
osd nearfull must be tuned accordingly so that no OSD goes beyond about
0.45, so that in case of failure of one disk, the other OSD in the node
has enough space for healing replication.
I reply to both: infact I am using Proxmox VE and I am following all
guidelines for ha hyperconverged server:
- three servers as reccomended by proxmox (with 10gb ethernet and so on)
- size=3 and min_size=2 reccomended by Ceph
It is not that a morning I wake up and put some random hardware together,
I followed guidelines.
The result should be:
- if a disk (or more) brokes work goes on
- if a server brokes the VMs on the server start on another server and
work goes on.
The result is: one disk brokes, ceph fills the other one in the same server
, reaches 90% and EVERYTHING stops including all VMs and the customer has
lost unsaved data and it cannot run the VMs it needs to continue works.
Not very "HA" as hoped.
Size=3 means 3xhdd cost. Now I must double it again 6x. Customer will not
buy other disks.
So I ask (again): apart the known fact that with size=2 I risk that a
second disk brokes before ceph has filled again the second copy of data are
there other risks??
I repeat: I know perfectly size=3 is "better" I followed guidelines but
what can happen with size=2 and min_size=1?
The only thing I can imagine is that if I power down switches I get a split
brain but in this case monitor quorum is not reached and so ceph should
stop writing and so I do not risk inconsistent data.
Are there other things to consider?
Thanks,
Mario