[ceph-users] Re: How big an OSD disk could be?

12 Mar 2021

...
  I assume the limits are those that linux imposes. iops
are the limits. One 20TB has 100 iops and 4x5TB have 400 iops. 400 iops serves more
clients that 100 iops. You decide what you need/want to have. 
...
   Any other
aspects on the limits of bigger capacity hard disk drives?  
 Recovery will take longer increasing the risk of another failure in the
 same time. 
Indeed — often overlooked, and it DOES happen.  This is especially perilous if one is
flirting with diaster with 2R pools.

HDD capacity has grown far more rapidly than HDD interfaces have become faster.

More vs fewer spindles helps, but it’s also true that every drive bay has a certain cost: 
a fraction of the chassis, switch ports, power, DC RUs.  

There are ultradense chassis for sure, which make sense in certain situations, but not
others.  If your cluster overall isn’t huge, say you have 6 ultradense chassis.  I’ve seen
ones that require special power to be run to the racks, jut into the aisles, are really
bottlenecked by the NICs and RAM capacity, etc.  And racks that at most can be half full
because of weight or more often amps, those wasted RUs have a cost.  Then when one node is
down, that’s 1/6 of the total, which could  be ugly if the mon_osd_down_out_subtree_limit
doesn’t forestall backups — recovery of a large fraction of the cluster will require a
large fraction of free space on the other nodes.  Recovery/backfill to/from an ultradense
node may well overwhelm the NIC/HBA too.

This is one reason that we’ll see flash, eg. QLC, increasingly viable as an HDD
replacement.  One can get a bunch of TB into an RU using ruler drives without sacrificing
half the rack, and recovery (and thus exposure to data loss / unavailability) is faster
than with HDDs.

— aad

2024

2023

2022

2021

2020

2019

[ceph-users] Re: How big an OSD disk could be?