On 23/08/2019 22:14, Paul Emmerich wrote:
On Fri, Aug 23, 2019 at 3:54 PM Florian Haas
<florian(a)citynetwork.eu> wrote:
On 23/08/2019 13:34, Paul Emmerich wrote:
Is this reproducible with crushtool?
Not for me.
ceph osd getcrushmap -o crushmap
crushtool -i crushmap --update-item XX 1.0 osd.XX --loc host
hostname-that-doesnt-exist-yet -o crushmap.modified
Replacing XX with the osd ID you tried to add.
Just checking whether this was intentional. As the issue pops up when
adding an new OSD *on* a new host, not moving an existing OSD *to* a new
host, I would have used --add-item here. Is there a specific reason why
you're suggesting to test with --update-item?
yes, update should map to create or move which it should use internally
At any rate, I tried with multiple different combinations (this is on a
12.2.12 test cluster; I can't test this in production):
which also ran into this bug? The idea of using crushtool is to not
crash your production cluster but just the local tool.
Ah, gotcha. I thought you wanted me to be able to at least do "ceph osd
setcrushmap" with the resulting crushmap, which would require a running
cluster.
So yes, doing this completely offline shows that you're definitely on to
something. I am able to crash crushtool with the original crushmap, and
what it appears to be falling over on is a choose_args map in there.
I've updated the bug report with this comment:
https://tracker.ceph.com/issues/40029#note-11
It would seem that there are two workarounds at this stage for
pre-Nautilus users with a choose_args map in their crushmap, and who for
some reason are unable to upgrade to Nautilus yet:
1. Add host buckets manually before adding new OSDs.
2. Drop any choose_args map from their crushmap.
As it happens I am not aware of any way to do #2 other than
- using getcrushmap,
- decompiling the crushmap,
- dropping the choose_args map from the textual representation of the
crushmap,
- recompiling, and then
- using setcrushmap.
Are you, by any chance?
Thanks again for your help!
Cheers,
Florian