OSD upgrades

List overview All Threads
Download

newer

older

professional services and support...

upgrade ceph and use cephadm - rgw...

Brent Kennedy

2 Jun 2020 2 Jun '20

10:44 a.m.

We are rebuilding servers and before luminous our process was: 1. Reweight the OSD to 0 2. Wait for rebalance to complete 3. Out the osd 4. Crush remove osd 5. Auth del osd 6. Ceph osd rm # Seems the luminous documentation says that you should: 1. Out the osd 2. Wait for the cluster rebalance to finish 3. Stop the osd 4. Osd purge # Is reweighting to 0 no longer suggested? Side note: I tried our existing process and even after reweight, the entire cluster restarted the balance again after step 4 ( crush remove osd ) of the old process. I should also note, by reweighting to 0, when I tried to run "ceph osd out #", it said it was already marked out. I assume the docs are correct, but just want to make sure since reweighting had been previously recommended. Regards, -Brent Existing Clusters: Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi gateways ( all virtual on nvme ) US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 gateways, 2 iscsi gateways

Show replies by date

Wido den Hollander

2 Jun 2 Jun

2:29 p.m.

On 6/2/20 5:44 AM, Brent Kennedy wrote:

...

The new commands just make it more simple. There are many ways to accomplish the same goal, but what the docs describe should work in most scenarios. Wido > > > > Regards, > > -Brent > > > > Existing Clusters: > > Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi > gateways ( all virtual on nvme ) > > US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 gateways, > 2 iscsi gateways > > UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways > > US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 gateways, > 2 iscsi gateways > > > > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

Paul Emmerich

5:10 p.m.

"reweight 0" and "out" are the exact same thing Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Tue, Jun 2, 2020 at 9:30 AM Wido den Hollander <wido(a)42on.com> wrote:

...

On 6/2/20 5:44 AM, Brent Kennedy wrote:

entire

cluster restarted the balance again after step 4 ( crush remove osd ) of

the

old process. I should also note, by reweighting to 0, when I tried to

run

"ceph osd out #", it said it was already marked out. I assume the docs are correct, but just want to make sure since

reweighting

had been previously recommended.

The new commands just make it more simple. There are many ways to accomplish the same goal, but what the docs describe should work in most scenarios. Wido

Regards, -Brent Existing Clusters: Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi gateways ( all virtual on nvme ) US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4

gateways,

2 iscsi gateways UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4

gateways

US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3

gateways,

2 iscsi gateways _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Thomas Byrne - UKRI STFC

5:31 p.m.

As you have noted, 'ceph osd reweight 0' is the same as an 'ceph osd out', but it is not the same as removing the OSD from the crush map (or setting crush weight to 0). This explains your observation of the double rebalance when you mark an OSD out (or reweight an OSD to 0), and then remove it later. To avoid this, I use a crush reweight for the initial step to move PGs off an OSD when draining nodes. You can then purge the OSD with no further PG movement. Double movement:

...

ceph osd out $i

# rebalancing

...

ceph osd purge $i

# more rebalancing Single movement:

...

ceph osd crush reweight $i 0

# rebalancing

...

ceph osd purge $i

# no rebalancing The reason this occurs (as I understand it) is that the reweight value is taken into account later in the crush calc, so an OSD with a reweight of 0 can still be picked for a PG set, and then the reweight kicks in and forces the calc to be retried, giving a different value for the PG set compared to if the OSD was not present, or had a crush weight of 0. Cheers, Tom

...

-----Original Message----- From: Brent Kennedy <bkennedy(a)cfl.rr.com> Sent: 02 June 2020 04:44 To: 'ceph-users' <ceph-users(a)ceph.io> Subject: [ceph-users] OSD upgrades We are rebuilding servers and before luminous our process was: 1. Reweight the OSD to 0 2. Wait for rebalance to complete 3. Out the osd 4. Crush remove osd 5. Auth del osd 6. Ceph osd rm # Seems the luminous documentation says that you should: 1. Out the osd 2. Wait for the cluster rebalance to finish 3. Stop the osd 4. Osd purge # Is reweighting to 0 no longer suggested? Side note: I tried our existing process and even after reweight, the entire cluster restarted the balance again after step 4 ( crush remove osd ) of the old process. I should also note, by reweighting to 0, when I tried to run "ceph osd out #", it said it was already marked out. I assume the docs are correct, but just want to make sure since reweighting had been previously recommended. Regards, -Brent Existing Clusters: Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi gateways ( all virtual on nvme ) US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 gateways, 2 iscsi gateways _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI.

Paul Emmerich

3 Jun 3 Jun

1:22 a.m.

Correct, "crush weight" and normal "reweight" are indeed very different. The original post mentions "rebuilding" servers, in this case the correct way is to use "destroy" and then explicitly re-use the OSD afterwards. purge is really only for OSDs that you don't get back (or broken disks that you don't replace quickly) Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Tue, Jun 2, 2020 at 12:32 PM Thomas Byrne - UKRI STFC < tom.byrne(a)stfc.ac.uk> wrote:

...

ceph osd out $i

# rebalancing

ceph osd purge $i

# more rebalancing Single movement:

ceph osd crush reweight $i 0

# rebalancing

ceph osd purge $i

entire

cluster restarted the balance again after step 4 ( crush remove osd ) of

the old

process. I should also note, by reweighting to 0, when I tried to run

"ceph osd

out #", it said it was already marked out. I assume the docs are correct, but just want to make sure since

reweighting

had been previously recommended. Regards, -Brent Existing Clusters: Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi gateways ( all virtual on nvme ) US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4

gateways,

2 iscsi gateways UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4

gateways

US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3

gateways,

2 iscsi gateways _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an

email to

ceph-users-leave(a)ceph.io

1431

days inactive

1431

days old

ceph-users@ceph.io

Manage subscription

4 comments

4 participants

tags (0)

participants (4)

Brent Kennedy
Paul Emmerich
Thomas Byrne - UKRI STFC
Wido den Hollander