On 05/02/2021 20:10, Mario Giammarco wrote:
It is not that a morning I wake up and put some random
I followed guidelines.
The result should be:
- if a disk (or more) brokes work goes on
- if a server brokes the VMs on the server start on another server and
work goes on.
The result is: one disk brokes, ceph fills the other one in the same server
, reaches 90% and EVERYTHING stops including all VMs and the customer has
lost unsaved data and it cannot run the VMs it needs to continue works.
Not very "HA" as hoped.
With three OSD hosts, each with two disks, size=3 and default CRUSH
rules (i.e. each replica goes to a different host) then each OSD host
would expect to get roughly 1/3 of the total data. Under normal running
this would mean each disk sees 1/6 of the total data.
When a single disk failed in your scenario above, all three hosts were
still available and still get 1/3 of the total data. Because one disk
failed, the surviving disk has to store the replicas that were on the
failed disk as well its own (so, 2/6 total data - double what it had
before). To have reached 90% full on the surviving disk suggests that it
was (at least) 45% full under normal running.
Ceph is doing what it's supposed to in this case, the issue is that the
disks haven't been sized large enough to allow for this failure.