On 23/08/2019 13:34, Paul Emmerich wrote:
Is this reproducible with crushtool?
Not for me.
ceph osd getcrushmap -o crushmap
crushtool -i crushmap --update-item XX 1.0 osd.XX --loc host
hostname-that-doesnt-exist-yet -o crushmap.modified
Replacing XX with the osd ID you tried to add.
Just checking whether this was intentional. As the issue pops up when
adding an new OSD *on* a new host, not moving an existing OSD *to* a new
host, I would have used --add-item here. Is there a specific reason why
you're suggesting to test with --update-item?
At any rate, I tried with multiple different combinations (this is on a
12.2.12 test cluster; I can't test this in production):
0. Get the current reference crushmap:
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.05846 root default
-5 0.01949 host daisy
0 hdd 0.01949 osd.0 up 1.00000 1.00000
-7 0.01949 host eric
1 hdd 0.01949 osd.1 up 1.00000 1.00000
-3 0.01949 host frank
2 hdd 0.01949 osd.2 up 1.00000 1.00000
# ceph osd getcrushmap -o crushmap
11
1. "Update" a nonexistent OSD belonging to a nonexistent host (your
suggestion):
# crushtool -i crushmap --update-item 59 0.01949 osd.59 --loc host
nonexistent -o crushmap-update-nonexistent-to-nonexistent
# ceph osd setcrushmap -i crushmap-update-nonexistent-to-nonexistent
12
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.01949 host nonexistent
59 0.01949 osd.59 DNE 0
-1 0.05846 root default
-5 0.01949 host daisy
0 hdd 0.01949 osd.0 up 1.00000 1.00000
-7 0.01949 host eric
1 hdd 0.01949 osd.1 up 1.00000 1.00000
-3 0.01949 host frank
2 hdd 0.01949 osd.2 up 1.00000 1.00000
# ceph osd setcrushmap -i crushmap
13
2. Add a nonexistent OSD belonging to a nonexistent host (I think this
is functionally identical):
# crushtool -i crushmap --add-item 59 0.01949 osd.59 --loc host
nonexistent -o crushmap-add-nonexistent-to-nonexistent
# ceph osd setcrushmap -i crushmap-add-nonexistent-to-nonexistent
14
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -9
0.01949 host nonexistent
59 0.01949 osd.59 DNE 0
-1 0.05846 root default
-5 0.01949 host daisy
0 hdd 0.01949 osd.0 up 1.00000 1.00000
-7 0.01949 host eric
1 hdd 0.01949 osd.1 up 1.00000 1.00000
-3 0.01949 host frank
2 hdd 0.01949 osd.2 up 1.00000 1.00000
# ceph osd setcrushmap -i crushmap
15
3. Move an existing OSD to a nonexistent host:
# crushtool -i crushmap --update-item 0 0.01949 osd.0 --loc host
nonexistent -o crushmap-update-existing-to-nonexistent
# ceph osd setcrushmap -i crushmap-update-existing-to-nonexistent
16
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.01949 host nonexistent
0 hdd 0.01949 osd.0 up 1.00000 1.00000
-1 0.03897 root default
-5 0 host daisy
-7 0.01949 host eric
1 hdd 0.01949 osd.1 up 1.00000 1.00000
-3 0.01949 host frank
2 hdd 0.01949 osd.2 up 1.00000 1.00000
# ceph osd setcrushmap -i crushmap
17
None of these crashed any mon.
However, there's this line in the bug report:
-19> 2019-08-22 10:08:11.897364 7f93797ab700 0
mon.cc-ceph-osd11-fra1(a)0(leader).osd e302401 create-or-move crush item
name 'osd.59' initial_weight 1.6374 at location
{host=cc-ceph-osd26-fra1,root=default}
So it's not trying to move the item to just a nonexistent host, but to a
nonexistent host *in the default root*.
So I retried the above commands with "--loc host nonexistent --loc root
default". No change other than everything showing up under default; no
mon crash.
And then I tried one more which was to *first* add just a new OSD under
the default root, and *then* moving that OSD to a new, nonexistent host,
also under the default root. Again, no mon crash.
So I'm afraid I am unable to reproduce this with crushtool and setcrushmap.
And I can't get my mons to crash with "ceph osd crush move", either:
ceph osd crush move osd.59 host=nonexistent root=default
moved item id 59 name 'osd.59' to location
{host=nonexistent,root=default} in crush map
Cheers,
Florian