Upmap balancer after node failure - ceph-users

2 Apr 2021

Dear ceph users,

On one of our clusters I have some difficulties with the upmap 
balancer.  We started with a reasonably well balanced cluster (using the 
balancer in upmap mode).  After a node failure, we crush reweighted all 
the OSDs of the node to take it out of the cluster - and waited for the 
cluster to rebalance.  Obviously, this significantly changes the crush 
map - hence the nice balance created by the balancer was gone.  The 
recovery mostly completed - but some of the OSDs became too full - so we 
neded up with a few PGs that were backfill_toofull.  The cluster has 
plenty of space (overall perhaps 65% full), only a few OSDs are >90% (we 
have backfillfull_ratio at 92%).  The balancer refuses to change 
anything since the cluster is not clean.  Yet - the cluster can't become 
clean without a few upmaps to help the top 3 or 4 most full OSDs.

I would think this is a fairly common situation - trying to recover 
after some failure.  Are there any recommendations on how to proceed?  
Obviously I can manually find and insert upmaps - but for a large 
cluster with tens of thousands of PGs, that isn't too practical.  Is 
there a way to tell the balancer to still do something even though some 
PGs are undersized (with a quick look at the python module - I didn't 
see any)?

The cluster is on Nautilus 14.2.15.

Thanks,

Andras