My impression is that cost / TB for a drive may be approaching parity, but the TB /drive
is still well below (or at least at densities approaching parity, cost / TB is still quite
high). I can get a Micron 15TB SSD for $2600, but why would I when I can get a 18TB
Seagate IronWolf for <$600, a 18TB Seagate Exos for <$500, or a 18TB WD Gold for
<$600? Personally I wouldn't use drives that big, in our little tiny clusters, but
it exemplifies the issues around discussing cost parity.
As such a cluster needs more dives for the same total size (thus more nodes), which drives
up the cost / TB for a cluster.
My 2 cents.
Dominic L. Hilsbos, MBA
Director – Information Technology
Perform Air International Inc.
DHilsbos(a)PerformAir.com
www.PerformAir.com
-----Original Message-----
From: Adam Boyhan [mailto:adamb@medent.com]
Sent: Thursday, February 4, 2021 10:58 AM
To: Anthony D'Atri
Cc: ceph-users
Subject: [ceph-users] Re: NVMe and 2x Replica
All great input and points guys.
Helps me lean towards 3 copes a bit more.
I mean honestly NVMe cost per TB isn't that much more than SATA SSD now. Somewhat
surprised the salesmen aren't pitching 3x replication as it makes them more money.
From: "Anthony D'Atri" <anthony.datri(a)gmail.com>
To: "ceph-users" <ceph-users(a)ceph.io>
Sent: Thursday, February 4, 2021 12:47:27 PM
Subject: [ceph-users] Re: NVMe and 2x Replica
I searched each to find the section where 2x was
discussed. What I found was interesting. First, there are really only 2 positions here:
Micron's and Red Hat's. Supermicro copies Micron's positon paragraph word for
word. Not surprising considering that they are advertising a Supermicro / Micron solution.
FWIW, at Cephalocon another vendor made a similar claim during a talk.
* Failure rates are averages, not minima. Some drives will always fail sooner
* Firmware and other design flaws can result in much higher rates of failure or insidious
UREs that can result in partial data unavailability or loss
* Latent soft failures may not be detected until a deep scrub succeeds, which could be
weeks later
* In a distributed system, there are up/down/failure scenarios where the location of even
one good / canonical / latest copy of data is unclear, especially when drive or HBA cache
is in play.
* One of these is a power failure. Sure PDU / PSU redundancy helps, but stuff happens,
like a DC underprovisioning amps, so that a spike in user traffic results in the whole row
going down :-x Various unpleasant things can happen.
I was championing R3 even pre-Ceph when I was using ZFS or HBA RAID. As others have
written, as drives get larger the time to fill them with replica data increases, as does
the chance of overlapping failures. I’ve experieneced R2 overlapping failures more than
once, with and before Ceph.
My sense has been that not many people run R2 for data they care about, and as has been
written recently 2,2 EC is safer with the same raw:usable ratio. I’ve figured that vendors
make R2 statements like these as a selling point to assert lower TCO. My first response is
often “How much would it cost you directly, and indirectly in terms of user / customer
goodwill, to loose data?”.
Personally, this looks like marketing BS to me. SSD
shops want to sell SSDs, but because of the cost difference they have to convince buyers
that their products are competitive.
^this. I’m watching the QLC arena with interest for the potential to narrow the CapEx gap.
Durability has been one concern, though I’m seeing newer products claiming that eg. ZNS
improves that. It also seems that there are something like what, *4* separate EDSFF /
ruler form factors, I really want to embrace those eg. for object clusters, but I’m VERY
wary of the longevity of competing standards and any single-source for chassies or drives.
Our products cost twice as much, but LOOK you only
need 2/3 as many, and you get all these other benefits (performance). Plus, if you replace
everything in 2 or 3 years anyway, then you won't have to worry about them failing.
Refresh timelines. You’re funny ;) Every time, every single time, that I’ve worked in an
organization that claims a 3 (or 5, or whatever) hardware refresh cycle, it hasn’t
happened. When you start getting close, the capex doesn’t materialize, or the opex cost of
DC hands and operational oversight. “How do you know that the drives will start failing or
getting slower? Let’s revisit this in 6 months”. Etc.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io