On 3/19/21 2:20 AM, Philip Brown wrote:
yup cephadm and orch was used to set all this up.
Current state of things:
ceph osd tree shows
33 hdd 1.84698 osd.33 destroyed 0 1.00000
^^ Destroyed, ehh, this doesn't look good to me. Ceph thinks this OSD is
destroyed. Do you know what might have happened to osd.33? Did you
perform a "kill an OSD" while testing?
AFAIK you can't fix that anymore. You will have to remove it and redploy
it. Might even get a new osd.id.
cephadm logs --name osd.33 --fsid xx-xx-xx-xx
along with the systemctl stuff I already saw, showed me new things such as
ceph-osd[1645438]: did not load config file, using default settings.
ceph-osd[1645438]: 2021-03-18T14:31:32.990-0700 7f8bf14e3bc0 -1 parse_file: filesystem
error: cannot get file size: No such file or directory
This suggested to me that I needed to copy over /etc/ceph/ceph.conf to the OSD node.
which I did.
I then also copied over the admin key and generated a fresh bootstrap-osd key with it,
just for good measure, with
ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring
I had saved the previous output of ceph-volume lvm list
and on the OSD node, ran
ceph-volume lvm prepare --data xxxx --block.db xxxx
But it says osd is already prepared.
I tried an activate... it tells me
--> ceph-volume lvm activate successful for osd ID: 33
but now the cephadm logs output shows me
ceph-osd[1677135]: 2021-03-18T17:57:47.982-0700 7ff64593f700 -1 monclient(hunting):
handle_auth_bad_method server allowed_methods [2] but i only support [2]
Not the best error message :-}
Indeed, would be nice to have a references to [2]. But I think why you
get this is because of the destroyed OSD. I would use cephadm docu on
how to replace an osd. Does that exist? We add a large thread about this
"container" topic (see "[ceph-users] ceph-ansible in Pacific and
beyond?").
Now what do I need to do?
I would remove osd.33. Even manually editing crushmaps if need to
(should not be the case), and then redeploy this osd and wait for recovery.
If you have not manually "destroyed" this osd than either things work
differently in Octopus from things I have seen so far, my memory is
failing me, or some really weird stuff is happening and I would really
like to know what that is.
Wat version are you running? Do note that 15.2.10 has been released.
Gr. Stefan