I've to say I am reading quite some interesting strategies in this
thread and I'd like to shortly take the time to compare them:
1) one by one osd adding
- least amount of pg rebalance
- will potentially re-re-balance data that has just been distributed
with the next OSD phase in
- limits the impact if you have a bug in the hdd/ssd series
The biggest problem with this approach is that you will re-re-re-balance
data over and over again and that will slowdown the process significantly.
2) reweighted phase in
- Starting slow with reweighting to a small amount of its potential
- Allows to see how the new OSD performs
- Needs manual interaction for growing
- delays the phase in possibly for "longer than necessary"
We use this approach when phasing in multiple, larger OSDs that are from
a newer / not so well known series of disks.
3) noin / norebalance based phase in
- Interesting approach to delay rebalancing until the "proper/final" new
storage is in place
- Unclear how much of a difference it makes if you insert the new set of
osds within a short timeframe (i.e. adding 1 osd at minute 0, 2nd at
minute 1, etc.)
4) All at once / randomly
- Least amount of manual tuning
- In a way something one "would expect" ceph to do right (but in
practice doesn't all the time)
- Might (likely) cause short term re-adjustments
- Might cause client i/o slowdown (see next point)
5) General slowing down
What we actually do in datacenterlight.ch is slowing down phase ins by
default via the followign tunings:
# Restrain recovery operations so that normal cluster is not affected
[osd]
osd max backfills = 1
osd recovery max active = 1
osd recovery op priority = 2
This works well in about 90% of the cases for us.
Quite an interesting thread, thanks everyone for sharing!
Cheers,
Nico
Anthony D'Atri <anthony.datri(a)gmail.com> writes:
Hi,
as far as I understand it,
you get no real benefit with doing them one by one, as each osd add, can cause a lot of
data to be moved to a different osd, even tho you just rebalanced it.
Less than with older releases, but yeah.
I’ve known someone who advised against doing them in parallel because one would — for a
time — have PGs with multiple remaps in the acting set. The objection may have been
paranoia, I’m not sure.
One compromise is to upweight the new OSDs one node at a time, so the churn is limited to
one failure domain at a time.
— aad
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
Sustainable and modern Infrastructures by ungleich.ch