As you have noted, 'ceph osd reweight 0' is the same as an 'ceph osd out',
but it is not the same as removing the OSD from the crush map (or setting crush weight to
0). This explains your observation of the double rebalance when you mark an OSD out (or
reweight an OSD to 0), and then remove it later.
To avoid this, I use a crush reweight for the initial step to move PGs off an OSD when
draining nodes. You can then purge the OSD with no further PG movement.
Double movement:
ceph osd out $i
# rebalancing
ceph osd purge $i
# more rebalancing
Single movement:
ceph osd crush reweight $i 0
# rebalancing
ceph osd purge $i
# no rebalancing
The reason this occurs (as I understand it) is that the reweight value is taken into
account later in the crush calc, so an OSD with a reweight of 0 can still be picked for a
PG set, and then the reweight kicks in and forces the calc to be retried, giving a
different value for the PG set compared to if the OSD was not present, or had a crush
weight of 0.
Cheers,
Tom
-----Original Message-----
From: Brent Kennedy <bkennedy(a)cfl.rr.com>
Sent: 02 June 2020 04:44
To: 'ceph-users' <ceph-users(a)ceph.io>
Subject: [ceph-users] OSD upgrades
We are rebuilding servers and before luminous our process was:
1. Reweight the OSD to 0
2. Wait for rebalance to complete
3. Out the osd
4. Crush remove osd
5. Auth del osd
6. Ceph osd rm #
Seems the luminous documentation says that you should:
1. Out the osd
2. Wait for the cluster rebalance to finish
3. Stop the osd
4. Osd purge #
Is reweighting to 0 no longer suggested?
Side note: I tried our existing process and even after reweight, the entire
cluster restarted the balance again after step 4 ( crush remove osd ) of the old
process. I should also note, by reweighting to 0, when I tried to run "ceph osd
out #", it said it was already marked out.
I assume the docs are correct, but just want to make sure since reweighting
had been previously recommended.
Regards,
-Brent
Existing Clusters:
Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi
gateways ( all virtual on nvme )
US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 gateways,
2 iscsi gateways
UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways
US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 gateways,
2 iscsi gateways
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to
ceph-users-leave(a)ceph.io
This email and any attachments are intended solely for the use of the named recipients. If
you are not the intended recipient you must not use, disclose, copy or distribute this
email or any of its attachments and should notify the sender immediately and delete this
email from your system. UK Research and Innovation (UKRI) has taken every reasonable
precaution to minimise risk of this email or any attachments containing viruses or malware
but the recipient should carry out its own virus and malware checks before opening the
attachments. UKRI does not accept any liability for any losses or damages which the
recipient may sustain due to presence of any viruses. Opinions, conclusions or other
information in this message and attachments that are not related directly to UKRI business
are solely those of the author and do not represent the views of UKRI.