so "ceph osd tree destroyed -f json-pretty" shows the nautilus2 host with
the osd id you're trying to replace here? And there are disks marked
available that match the spec (20G rotational disk in this case I guess) in
"ceph orch device ls nautilus2"?
On Mon, Feb 20, 2023 at 10:16 AM Eugen Block <eblock(a)nde.ag> wrote:
I stumbled upon this option 'osd_id_claims'
[2], so I tried to apply a
replace.yaml to redeploy only the one destroyed disk, but still
nothing happens with that disk. This is my replace.yaml:
---snip---
nautilus:~ # cat replace-osd-7.yaml
service_type: osd
service_name: osd
placement:
hosts:
- nautilus2
spec:
data_devices:
rotational: 1
size: '20G:'
db_devices:
rotational: 0
size: '13G:16G'
filter_logic: AND
objectstore: bluestore
osd_id_claims:
nautilus2: ['7']
---snip---
I see these lines in the mgr.log:
Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log
[INF] : Found osd claims -> {'nautilus2': ['7']}
Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: [cephadm INFO
cephadm.services.osd] Found osd claims for drivegroup None ->
{'nautilus2': ['7']}
Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log
[INF] : Found osd claims for drivegroup None -> {'nautilus2': ['7']}
But I see no attempt to actually deploy the OSD.
[2]
https://docs.ceph.com/en/quincy/mgr/orchestrator_modules/#orchestrator-osd-…
Zitat von Adam King <adking(a)redhat.com>om>:
For reference, a stray daemon from cephadm POV is
roughly just something
that shows up in "ceph node ls" that doesn't have a directory in
/var/lib/ceph/<fsid>. I guess manually making the OSD as you did means
that
didn't end up getting made. I remember the
manual osd creation process
(by
manual just meaning not using an
orchestrator/cephadm mgr module command)
coming up at one point and the we ended up manually running "cephadm
deploy" to make sure those directories get created correctly, but I don't
think any docs ever got made about it (yet, anyway). Also, is there a
tracker issue for it not correctly handling the drivegroup?
On Mon, Feb 20, 2023 at 8:58 AM Eugen Block <eblock(a)nde.ag> wrote:
> Thanks, Adam.
>
> Providing the keyring to the cephadm command worked, but the unwanted
> (but expected) side effect is that from cephadm perspective it's a
> stray daemon. For some reason the orchestrator did apply the desired
> drivegroup when I tried to reproduce this morning, but then again
> failed just now when I wanted to get rid of the stray daemon. This is
> one of the most annoying things with cephadm, I still don't fully
> understand when it will correctly apply the identical drivegroup.yml
> and when not. Anyway, the conclusion is to not interfere with cephadm
> (nothing new here), but since the drivegroup was not applied correctly
> I assumed I had to "help out" a bit by manually deploying an OSD.
>
> Thanks,
> Eugen
>
> Zitat von Adam King <adking(a)redhat.com>om>:
>
> > Going off of
> >
> > ceph --cluster ceph --name client.bootstrap-osd --keyring
> > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> >
> > you could try passing "--keyring <bootstrap-osd-keyring" to the
cephadm
> > ceph-volume command. Something like
'cephadm ceph-volume --keyring
> > <bootstrap-osd-keyring> -- lvm create'. I'm guessing it's
trying to
run
> the
> > osd tree command within a container and I know cephadm mounts keyrings
> > passed to the ceph-volume command as
> > "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container.
> >
> > On Mon, Feb 20, 2023 at 6:35 AM Eugen Block <eblock(a)nde.ag> wrote:
> >
> >> Hi *,
> >>
> >> I was playing around on an upgraded test cluster (from N to Q),
> >> current version:
> >>
> >> "overall": {
> >> "ceph version 17.2.5
> >> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18
> >> }
> >>
> >> I tried to replace an OSD after destroying it with 'ceph orch osd rm
> >> osd.5 --replace'. The OSD was drained successfully and marked as
> >> "destroyed" as expected, the zapping also worked. At this point I
> >> didn't have an osd spec in place because all OSDs were adopted during
> >> the upgrade process. So I created a new spec which was not applied
> >> successfully (I'm wondering if there's another/new issue with
> >> ceph-volume, but that's not the focus here), so I tried it manually
> >> with 'cephadm ceph-volume lvm create'. I'll add the output at
the end
> >> for a better readability. Apparently, there's no boostrap-osd keyring
> >> for cephadm so it can't search the desired osd_id in the osd tree,
the
> >> command it tries is this:
> >>
> >> ceph --cluster ceph --name client.bootstrap-osd --keyring
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> >>
> >> In the local filesystem the required keyring is present, though:
> >>
> >> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring
> >> [client.bootstrap-osd]
> >> key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug==
> >> caps mgr = "allow r"
> >> caps mon = "profile bootstrap-osd"
> >>
> >> Is there something missing during the adoption process? Or are the
> >> docs lacking some upgrade info? I found a section about putting
> >> keyrings under management [1], but I'm not sure if that's
what's
> >> missing here.
> >> Any insights are highly appreciated!
> >>
> >> Thanks,
> >> Eugen
> >>
> >> [1]
> >>
> >>
>
https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under…
> >>
> >>
> >> ---snip---
> >> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data
/dev/sde
> >> --block.db /dev/sdb --block.db-size
5G
> >> Inferring fsid <FSID>
> >> Using recent ceph image
> >> <LOCAL_REGISTRY>/ceph/ceph@sha256
> >> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host
> >> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json
--net=host
> >> --entrypoint /usr/sbin/ceph-volume
--privileged --group-add=disk
> >> --init -e
> >> CONTAINER_IMAGE=<LOCAL_REGISTRY>/ceph/ceph@sha256
> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e
> >> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
> >> /var/run/ceph/<FSID>:/var/run/ceph:z -v
> >> /var/log/ceph/<FSID>:/var/log/ceph:z -v
> >> /var/lib/ceph/<FSID>/crash:/var/lib/ceph/crash:z -v /dev:/dev -v
> >> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
> >> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
> >> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z
> >> <LOCAL_REGISTRY>/ceph/ceph@sha256
> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb
> --block.db-size
> >> 5G
> >> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00"
level=warning
> >> msg="Path
\"/etc/SUSEConnect\" from \"/etc/containers/mounts.conf\"
> >> doesn't exist, skipping"
> >> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00"
level=warning
> >> msg="Path
\"/etc/zypp/credentials.d/SCCcredentials\" from
> >> \"/etc/containers/mounts.conf\" doesn't exist, skipping"
> >> /usr/bin/podman: stderr Running command: /usr/bin/ceph-authtool
> >> --gen-print-key
> >> /usr/bin/podman: stderr Running command: /usr/bin/ceph --cluster ceph
> >> --name client.bootstrap-osd --keyring
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+0000
> >> 7fd255e30700 -1 auth: unable to find a keyring on
> >>
>
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
> >> (2) No such file or
> >> directory
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+0000
> >> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at
> >>
>
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
> >> disabling
> >> cephx
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.852+0000
> >> 7fd255e30700 -1 auth: unable to find a keyring on
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or
directory
> >> /usr/bin/podman: stderr stderr:
2023-02-20T08:02:50.852+0000
> >> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000
> >> 7fd255e30700 -1 auth: unable to find a keyring on
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or
directory
> >> /usr/bin/podman: stderr stderr:
2023-02-20T08:02:50.856+0000
> >> 7fd255e30700 -1 AuthRegistry(0x7fd250065910) no keyring found at
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000
> >> 7fd255e30700 -1 auth: unable to find a keyring on
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or
directory
> >> /usr/bin/podman: stderr stderr:
2023-02-20T08:02:50.856+0000
> >> 7fd255e30700 -1 AuthRegistry(0x7fd255e2eea0) no keyring found at
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
> >> /usr/bin/podman: stderr stderr: [errno 2] RADOS object not found
> >> (error connecting to the cluster)
> >> /usr/bin/podman: stderr Traceback (most recent call last):
> >> /usr/bin/podman: stderr File "/usr/sbin/ceph-volume", line 11,
in
> >> <module>
> >> /usr/bin/podman: stderr load_entry_point('ceph-volume==1.0.0',
> >> 'console_scripts', 'ceph-volume')()
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41,
in
> >> __init__
> >> /usr/bin/podman: stderr self.main(self.argv)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line
59,
> >> in newfunc
> >> /usr/bin/podman: stderr return f(*a, **kw)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153,
in
> >> main
> >> /usr/bin/podman: stderr terminal.dispatch(self.mapper,
> subcommand_args)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
194,
> >> in dispatch
> >> /usr/bin/podman: stderr instance.main()
> >> /usr/bin/podman: stderr File
> >>
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py",
> >> line 46, in main
> >> /usr/bin/podman: stderr terminal.dispatch(self.mapper, self.argv)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line
194,
> >> in dispatch
> >> /usr/bin/podman: stderr instance.main()
> >> /usr/bin/podman: stderr File
> >>
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
> >> line 77, in main
> >> /usr/bin/podman: stderr self.create(args)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line
16,
> >> in is_root
> >> /usr/bin/podman: stderr return func(*a, **kw)
> >> /usr/bin/podman: stderr File
> >>
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
> >> line 26, in create
> >> /usr/bin/podman: stderr prepare_step.safe_prepare(args)
> >> /usr/bin/podman: stderr File
> >>
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py",
> >> line 252, in safe_prepare
> >> /usr/bin/podman: stderr self.prepare()
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line
16,
> >> in is_root
> >> /usr/bin/podman: stderr return func(*a, **kw)
> >> /usr/bin/podman: stderr File
> >>
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py",
> >> line 292, in prepare
> >> /usr/bin/podman: stderr self.osd_id =
> >> prepare_utils.create_id(osd_fsid, json.dumps(secrets),
> >> osd_id=self.args.osd_id)
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py",
line
> >> 166, in create_id
> >> /usr/bin/podman: stderr if osd_id_available(osd_id):
> >> /usr/bin/podman: stderr File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py",
line
> >> 204, in osd_id_available
> >> /usr/bin/podman: stderr raise RuntimeError('Unable check if OSD
id
> >> exists: %s' % osd_id)
> >> /usr/bin/podman: stderr RuntimeError: Unable check if OSD id exists:
5
> >> Traceback (most recent call last):
> >> File "/usr/sbin/cephadm", line 9170, in <module>
> >> main()
> >> File "/usr/sbin/cephadm", line 9158, in main
> >> r = ctx.func(ctx)
> >> File "/usr/sbin/cephadm", line 1917, in _infer_config
> >> return func(ctx)
> >> File "/usr/sbin/cephadm", line 1877, in _infer_fsid
> >> return func(ctx)
> >> File "/usr/sbin/cephadm", line 1945, in _infer_image
> >> return func(ctx)
> >> File "/usr/sbin/cephadm", line 1835, in _validate_fsid
> >> return func(ctx)
> >> File "/usr/sbin/cephadm", line 5294, in command_ceph_volume
> >> out, err, code = call_throws(ctx, c.run_cmd())
> >> File "/usr/sbin/cephadm", line 1637, in call_throws
> >> raise RuntimeError('Failed command: %s' % '
'.join(command))
> >> RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host
> >> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json
--net=host
> >> --entrypoint /usr/sbin/ceph-volume
--privileged --group-add=disk
> >> --init -e
> >> CONTAINER_IMAGE=<LOCAL_REGISTRY>/ceph/ceph@sha256
> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e
> >> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
> >> /var/run/ceph/<FSID>:/var/run/ceph:z -v
> >> /var/log/ceph/<FSID>:/var/log/ceph:z -v
> >> /var/lib/ceph/<FSID>/crash:/var/lib/ceph/crash:z -v /dev:/dev -v
> >> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
> >> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
> >> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z
> >> <LOCAL_REGISTRY>/ceph/ceph@sha256
> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb
> --block.db-size
> >> 5G
> >> ---snip---
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users(a)ceph.io
> >> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >>
> >>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>