Phil, this would be an excellent contribution to the blog or the introductory
documentation. I’ve been using Ceph for over a year this brought together a lot of
concepts that I hadn’t related so succinctly in the past.
One of the things that I hadn’t really conceptualized well was “why size of 3?” I knew
that PGs went to read-only without a quorum of OSDs to write to, but this is a much
simpler way to think about it.
Something I have been experimenting with that might also be interesting to the discussion
is “when to use redundancy at all”. Kafka is a good example of "eventually
consistent" software that is designed for complete node failure and extremely high
performance. If Kafka is backed by a replicated pool, I’ve come to believe this is
suboptimal compared to having three Kafka instances, each without replication in Ceph.
The logical question is “why use Ceph at all then?” To me, this is about centralized
management process. If I am building with Ceph in most places, using it everywhere creates
operational consistency. (Modifying CRUSH maps is the path to enabling unreplicated
storage that is pinned to a specific machine that also contains the Kafka workload.)
At any rate, eventually consistent software packages can provide additional options for
top level failure domain requirements.
Brian
On May 29, 2020, at 10:48 AM,
<DHilsbos(a)performair.com> <DHilsbos(a)performair.com> wrote:
Phil;
I like to refer to basic principles, and design assumptions / choices when considering
things like this. I also like to refer to more broadly understood technologies. Finally;
I'm still relatively new to Ceph, so here it goes...
TLDR: Ceph is (likes to be) double-redundent (like RAID-6), while dual power (n+1) is
single-redundant.
Like RAID, Ceph (or more precisely a Ceph pool) can be in, and moves through, the
following states:
Normal --> Partially Failed (degraded) --> Recovering --> Normal.
When talking about these systems, we often gloss over Recovery, acting as if it takes no
time. Recovery does take time though, and if anything ELSE happens while recovery is
ongoing, what can the software do?
Think RAID-5; what happens if a drive fails in a RAID-5 array, and during recovery an
unreadable block is found on another drive? That's single redundancy. If you use
RAID-6, the array goes to the second redundancy level, and the recovery continues.
As a result of the long recovery times expected of modern large hard-drives, Ceph pushes
for double-redundancy (3x replication, 5-2 EC). Further, it decreases availability the
more redundancy is degraded (i.e. when the first layer of redundancy is compromised,
writes are still allowed. When the second is lost, writes are disallowed, but reads are
allowed. Only when all three layers are compromised are reads disallowed).
Dual power feeds (n+1) is only single-redundant, thus the entire system can't achieve
better than single-redundancy. Depending on the reliability of the power, and your
service guarantees, this may be acceptable.
If you add ATSs, then you need to look at the failure rate (MTBF, or similar) to
determine if your service guarantees are impacted.
Dominic L. Hilsbos, MBA
Director – Information Technology
Perform Air International Inc.
DHilsbos(a)PerformAir.com
www.PerformAir.com
-----Original Message-----
From: Phil Regnauld [mailto:pr@x0.dk]
Sent: Friday, May 29, 2020 12:59 AM
To: Hans van den Bogert
Cc: ceph-users(a)ceph.io
Subject: [ceph-users] Re: CEPH failure domain - power considerations
Hans van den Bogert (hansbogert) writes:
I would second that, there's no winning in
this case for your
requirements and single PSU nodes. If there were 3 feeds, then yes;
you could make an extra layer in your crushmap much like you would
incorporate a rack topology in the crushmap.
I'm not fully up on coffee for today, so I haven't yet worked out why
3 feeds would help ? To have a 'tie breaker' of sorts, with hosts spread
across 3 rails ?
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to
ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io