This is in part a question of *how many* of those dense OSD nodes you have. If you have a
hundred of them, then most likely they’re spread across a decent number of racks and the
loss of one or two is a tolerable *fraction* of the whole cluster.
If you have a cluster of just, say, 3-4 of these dense nodes, component failure, network
glitches, and even maintenance become problematic.
You can *mostly* forestall whole-node rebalancing by careful alignment of fault domains
with the value of mon_osd_down_out_subtree_limit. There are cases where it doesn’t kick
in and a whole node will attempt to rebalance, which — assuming the CRUSH rules and
topology are fault-tolerant — may cause surviving OSDs to reach full or backfillfull
states, potentially resulting in an outage.
If the limit does kick in, you’ll have reduced or no redundancy until you either bring the
host/OSDs back up, or manually cause the recovery to proceed.
As was already mentioned as well, having a small number of fault domains also limits the
EC strategies you can safely use.
Thanks Paul. I was speaking more about total OSDs and
RAM, rather than a single node. However, I am considering building a cluster with a large
OSD/node count. This would be for archival use, with reduced performance and availability
requirements. What issues would you anticipate with a large OSD/node count? Is the concern
just the large rebalance if a node fails and takes out a large portion of the OSDs at
once?