I haven't looked too closely for open tracker issues regarding
ceph-volume, to be honest. I'm still not even sure if I'm doing
something wrong or if it's an actual ceph issue. I still have a couple
of OSDs left to play around in this cluster. So I tried it with a
different OSD, it is showing up as "destroyed" in the osd tree, but
the orchestrator isn't redeploying it although the osd disk and the
corresponding block.db lv have been wiped. There's nothing in
cephadm.log except the "check-host" and "gather-facts". If I would
remove the destroyed OSD from the crushmap I'm sure it would be
redeployed successfully, it was earlier. Any idea why it's not
redeployed?
Zitat von Adam King <adking(a)redhat.com>om>:
For reference, a stray daemon from cephadm POV is
roughly just something
that shows up in "ceph node ls" that doesn't have a directory in
/var/lib/ceph/<fsid>. I guess manually making the OSD as you did means that
didn't end up getting made. I remember the manual osd creation process (by
manual just meaning not using an orchestrator/cephadm mgr module command)
coming up at one point and the we ended up manually running "cephadm
deploy" to make sure those directories get created correctly, but I don't
think any docs ever got made about it (yet, anyway). Also, is there a
tracker issue for it not correctly handling the drivegroup?
On Mon, Feb 20, 2023 at 8:58 AM Eugen Block <eblock(a)nde.ag> wrote:
> Thanks, Adam.
>
> Providing the keyring to the cephadm command worked, but the unwanted
> (but expected) side effect is that from cephadm perspective it's a
> stray daemon. For some reason the orchestrator did apply the desired
> drivegroup when I tried to reproduce this morning, but then again
> failed just now when I wanted to get rid of the stray daemon. This is
> one of the most annoying things with cephadm, I still don't fully
> understand when it will correctly apply the identical drivegroup.yml
> and when not. Anyway, the conclusion is to not interfere with cephadm
> (nothing new here), but since the drivegroup was not applied correctly
> I assumed I had to "help out" a bit by manually deploying an OSD.
>
> Thanks,
> Eugen
>
> Zitat von Adam King <adking(a)redhat.com>om>:
>
> > Going off of
> >
> > ceph --cluster ceph --name client.bootstrap-osd --keyring
> > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> >
> > you could try passing "--keyring <bootstrap-osd-keyring" to the
cephadm
> > ceph-volume command. Something like 'cephadm ceph-volume --keyring
> > <bootstrap-osd-keyring> -- lvm create'. I'm guessing it's
trying to run
> the
> > osd tree command within a container and I know cephadm mounts keyrings
> > passed to the ceph-volume command as
> > "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container.
> >
> > On Mon, Feb 20, 2023 at 6:35 AM Eugen Block <eblock(a)nde.ag> wrote:
> >
> >> Hi *,
> >>
> >> I was playing around on an upgraded test cluster (from N to Q),
> >> current version:
> >>
> >> "overall": {
> >> "ceph version 17.2.5
> >> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18
> >> }
> >>
> >> I tried to replace an OSD after destroying it with 'ceph orch osd rm
> >> osd.5 --replace'. The OSD was drained successfully and marked as
> >> "destroyed" as expected, the zapping also worked. At this point I
> >> didn't have an osd spec in place because all OSDs were adopted during
> >> the upgrade process. So I created a new spec which was not applied
> >> successfully (I'm wondering if there's another/new issue with
> >> ceph-volume, but that's not the focus here), so I tried it manually
> >> with 'cephadm ceph-volume lvm create'. I'll add the output at
the end
> >> for a better readability. Apparently, there's no boostrap-osd keyring
> >> for cephadm so it can't search the desired osd_id in the osd tree, the
> >> command it tries is this:
> >>
> >> ceph --cluster ceph --name client.bootstrap-osd --keyring
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> >>
> >> In the local filesystem the required keyring is present, though:
> >>
> >> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring
> >> [client.bootstrap-osd]
> >> key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug==
> >> caps mgr = "allow r"
> >> caps mon = "profile bootstrap-osd"
> >>
> >> Is there something missing during the adoption process? Or are the
> >> docs lacking some upgrade info? I found a section about putting
> >> keyrings under management [1], but I'm not sure if that's
what's
> >> missing here.
> >> Any insights are highly appreciated!
> >>
> >> Thanks,
> >> Eugen
> >>
> >> [1]
> >>
> >>
>
https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under…
> >>
> >>
> >> ---snip---
> >> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde
> >> --block.db /dev/sdb --block.db-size 5G
> >> Inferring fsid <FSID>
> >> Using recent ceph image
> >> <LOCAL_REGISTRY>/ceph/ceph@sha256
> >> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host
> >> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host
> >> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
> >> --init -e
> >> CONTAINER_IMAGE=<LOCAL_REGISTRY>/ceph/ceph@sha256
> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e
> >> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
> >> /var/run/ceph/<FSID>:/var/run/ceph:z -v
> >> /var/log/ceph/<FSID>:/var/log/ceph:z -v
> >> /var/lib/ceph/<FSID>/crash:/var/lib/ceph/crash:z -v /dev:/dev -v
> >> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
> >> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
> >> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z
> >> <LOCAL_REGISTRY>/ceph/ceph@sha256
> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb
> --block.db-size
> >> 5G
> >> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00"
level=warning
> >> msg="Path \"/etc/SUSEConnect\" from
\"/etc/containers/mounts.conf\"
> >> doesn't exist, skipping"
> >> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00"
level=warning
> >> msg="Path \"/etc/zypp/credentials.d/SCCcredentials\" from
> >> \"/etc/containers/mounts.conf\" doesn't exist, skipping"
> >> /usr/bin/podman: stderr Running command: /usr/bin/ceph-authtool
> >> --gen-print-key
> >> /usr/bin/podman: stderr Running command: /usr/bin/ceph --cluster ceph
> >> --name client.bootstrap-osd --keyring
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+0000
> >> 7fd255e30700 -1 auth: unable to find a keyring on
> >>
>
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
> >> (2) No such file or
> >> directory
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+0000
> >> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at
> >>
>
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
> >> disabling
> >> cephx
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.852+0000
> >> 7fd255e30700 -1 auth: unable to find a keyring on
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.852+0000
> >> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000
> >> 7fd255e30700 -1 auth: unable to find a keyring on
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000
> >> 7fd255e30700 -1 AuthRegistry(0x7fd250065910) no keyring found at
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000
> >> 7fd255e30700 -1 auth: unable to find a keyring on
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000
> >> 7fd255e30700 -1 AuthRegistry(0x7fd255e2eea0) no keyring found at
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
> >> /usr/bin/podman: stderr stderr: [errno 2] RADOS object not found
> >> (error connecting to the cluster)
> >> /usr/bin/podman: stderr Traceback (most recent call last):
> >> /usr/bin/podman: stderr File "/usr/sbin/ceph-volume", line 11,
in
> >> <module>
> >> /usr/bin/podman: stderr load_entry_point('ceph-volume==1.0.0',
> >> 'console_scripts', 'ceph-volume')()
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41,
in
> >> __init__
> >> /usr/bin/podman: stderr self.main(self.argv)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line
59,
> >> in newfunc
> >> /usr/bin/podman: stderr return f(*a, **kw)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153,
in
> >> main
> >> /usr/bin/podman: stderr terminal.dispatch(self.mapper,
> subcommand_args)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
194,
> >> in dispatch
> >> /usr/bin/podman: stderr instance.main()
> >> /usr/bin/podman: stderr File
> >>
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py",
> >> line 46, in main
> >> /usr/bin/podman: stderr terminal.dispatch(self.mapper, self.argv)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
194,
> >> in dispatch
> >> /usr/bin/podman: stderr instance.main()
> >> /usr/bin/podman: stderr File
> >>
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
> >> line 77, in main
> >> /usr/bin/podman: stderr self.create(args)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line
16,
> >> in is_root
> >> /usr/bin/podman: stderr return func(*a, **kw)
> >> /usr/bin/podman: stderr File
> >>
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
> >> line 26, in create
> >> /usr/bin/podman: stderr prepare_step.safe_prepare(args)
> >> /usr/bin/podman: stderr File
> >>
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py",
> >> line 252, in safe_prepare
> >> /usr/bin/podman: stderr self.prepare()
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line
16,
> >> in is_root
> >> /usr/bin/podman: stderr return func(*a, **kw)
> >> /usr/bin/podman: stderr File
> >>
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py",
> >> line 292, in prepare
> >> /usr/bin/podman: stderr self.osd_id =
> >> prepare_utils.create_id(osd_fsid, json.dumps(secrets),
> >> osd_id=self.args.osd_id)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py",
line
> >> 166, in create_id
> >> /usr/bin/podman: stderr if osd_id_available(osd_id):
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py",
line
> >> 204, in osd_id_available
> >> /usr/bin/podman: stderr raise RuntimeError('Unable check if OSD id
> >> exists: %s' % osd_id)
> >> /usr/bin/podman: stderr RuntimeError: Unable check if OSD id exists: 5
> >> Traceback (most recent call last):
> >> File "/usr/sbin/cephadm", line 9170, in <module>
> >> main()
> >> File "/usr/sbin/cephadm", line 9158, in main
> >> r = ctx.func(ctx)
> >> File "/usr/sbin/cephadm", line 1917, in _infer_config
> >> return func(ctx)
> >> File "/usr/sbin/cephadm", line 1877, in _infer_fsid
> >> return func(ctx)
> >> File "/usr/sbin/cephadm", line 1945, in _infer_image
> >> return func(ctx)
> >> File "/usr/sbin/cephadm", line 1835, in _validate_fsid
> >> return func(ctx)
> >> File "/usr/sbin/cephadm", line 5294, in command_ceph_volume
> >> out, err, code = call_throws(ctx, c.run_cmd())
> >> File "/usr/sbin/cephadm", line 1637, in call_throws
> >> raise RuntimeError('Failed command: %s' % '
'.join(command))
> >> RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host
> >> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host
> >> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
> >> --init -e
> >> CONTAINER_IMAGE=<LOCAL_REGISTRY>/ceph/ceph@sha256
> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e
> >> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
> >> /var/run/ceph/<FSID>:/var/run/ceph:z -v
> >> /var/log/ceph/<FSID>:/var/log/ceph:z -v
> >> /var/lib/ceph/<FSID>/crash:/var/lib/ceph/crash:z -v /dev:/dev -v
> >> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
> >> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
> >> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z
> >> <LOCAL_REGISTRY>/ceph/ceph@sha256
> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb
> --block.db-size
> >> 5G
> >> ---snip---
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users(a)ceph.io
> >> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >>
> >>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>