On Thu, Apr 11, 2024 at 2:20 PM Nicolas FOURNIL <nicolas.fournil(a)gmail.com>
wrote:
Hello,
Thanks for the pointer in M/L. The problem is not exactly the same but
it's the same root cause : too many rights in cephadm orch podman's
creations.
Why not put in a bug tracker to be fixed ? I think that it should also be
a security problem (too large access are always a bad idea)
To finish, I totally agree with you : The "-o
force" is not an option.
Regards
Nicolas F
Le jeu. 11 avr. 2024 à 10:34, Ilya Dryomov <idryomov(a)gmail.com> a écrit :
> On Thu, Apr 11, 2024 at 9:47 AM Nicolas FOURNIL <
> nicolas.fournil(a)gmail.com> wrote:
>
>> Hello
>>
>> I work on the famous "rbd: unmap failed: (16) Device or resource busy"
>> which causes problems for some projects such as Incus (LXD), and many
>> people just live with it... I post it on the ceph-user M/L and Murilo
>> Morais works for several days with me on it.
>>
>>
>>
https://discuss.linuxcontainers.org/t/incus-0-x-and-ceph-rbd-map-is-sometim…
>> <= Work with Stephane Graber INCUS/LXD main developer =>
>>
>>
https://discuss.linuxcontainers.org/t/howto-delete-container-with-ceph-rbd-…
>>
>> I work on an easy reproducible bug setup without any other tool than
>> stock ceph setup : create an image, map it, format&mount it, ADD AN OSD,
>> and try to unmap the image ... tada ! "rbd: unmap failed: (16) Device or
>> resource busy" is here.
>>
>> Here's the more complete explanation, (who is the great job of Murilo
>> Morais) :
>>
>>
>>
>>
>>
>>
>>
>> *I managed to reproduce.The problem is how docker/podman binds "/" to
>> "/rootfs" in containers.When ceph creates the files for SystemD to
start
>> the services, it includes a system root bind to /rootfs [1], I do not
>> recommend removing this bind, as it will break MON.*
>> *By default they use "rprivate" [2][3], it causes the mount points to
be
>> propagated to the container but does not receive any "mount" or
"umount"
>> events from the host [4]. This causes this behavior in your cluster.*
>> *This will happen whenever any container starts/restarts, regardless of
>> whether it is a new daemon or not.*
>>
>>
>> *A quick alternative would be to change the unit files in
>> /var/lib/ceph/<fsid>/<daemon>/ and add "slave" or
"rslave" to the podman
>> bind argument. Where it contains "-v /:/rootfs" add ":slave",
leaving "-v
>> /:/rootfs:slave". The inconvenience is that it will be necessary to restart
>> all daemons, and, when adding/redeploying a daemon, you will have to
>> perform the same steps.*
>>
>
> Hi Nicolas,
>
> This was previously debugged and discussed in this thread:
>
https://www.spinics.net/lists/ceph-devel/msg55987.html
>
>
>>
>>
>> *A definitive solution would be to change the source code, but I didn't
>> have time to try this option. Tomorrow, as soon as I get to the office, I
>> will try to do this, I will report back to you as soon as I discover
>> something!*
>>
>
> @Guillaume Abrioux <gabrioux(a)redhat.com> planned to run some tests and
> put up a patch to use a "slave" mount there. Did that not happen?
>
> I (still) think that cephadm should also replace a blanket "-v /:/rootfs"
> mount with a set of targeted mounts, each with a clearly expressed and
> documented purpose. Adding @Adam King <adking(a)redhat.com> who maintains
> cephadm.
>
>
>> *If you wish, you can respond to the public list about your problem and
>> the need to change the bind to rootfs, for everyone to see and so that some
>> of the project's devs can perhaps comment on something.*
>>
>
> From an RBD perspective, in most cases one can work around this issue by
> passing "-o force" to "rbd unmap" to disable the safety check. I
wouldn't
> suggest that to an average user though, for obvious reasons.
>
> Thanks,
>
> Ilya
>
>
>>
>> *Have a good night!*
>>
>> *[1]
>>
https://github.com/ceph/ceph/blob/7714874efb08facee80f92b358993fa56854bb01/…
>>
<https://github.com/ceph/ceph/blob/7714874efb08facee80f92b358993fa56854bb01/src/cephadm/cephadmlib/daemons/ceph.py#L423>*
>> *[2]
>>
https://docs.docker.com/storage/bind-mounts/#configure-bind-propagation
>> <https://docs.docker.com/storage/bind-mounts/#configure-bind-propagation>*
>> *[3]
>>
https://docs.podman.io/en/latest/markdown/podman-create.1.html#mount-type-t…
>>
<https://docs.podman.io/en/latest/markdown/podman-create.1.html#mount-type-type-type-specific-option>*
>>
>> *[4]
https://man7.org/linux/man-pages/man7/mount_namespaces.7.html
>> <https://man7.org/linux/man-pages/man7/mount_namespaces.7.html>*
>>
>> And the reproduction log is :
>>
>> First I create a new image and map it (to be alone from Incus) :
>> root@ceph02-r2b-fl1:~# ceph osd tree
>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>> -1 53.12473 root default
>> ...
>> -5 3.63879 host ceph02-r2b-fl1
>> 6 hdd 0.90970 osd.6 up 1.00000 1.00000
>> 9 hdd 0.90970 osd.9 up 1.00000 1.00000
>> 2 nvme 0.90970 osd.2 up 1.00000 1.00000
>> 4 nvme 0.90970 osd.4 up 1.00000 1.00000
>> ...
>> root@ceph02-r2b-fl1:~# rbd create image1 --size 1024 --pool
>> customers-clouds.ix-mrs2.fr.eho
>> root@ceph02-r2b-fl1:~# RBD_DEVICE=$(rbd map
>> customers-clouds.ix-mrs2.fr.eho/image1)
>> root@ceph02-r2b-fl1:~# mkfs.ext4 ${RBD_DEVICE}
>> mke2fs 1.47.0 (5-Feb-2023)
>> Discarding device blocks: done
>> Creating filesystem with 262144 4k blocks and 65536 inodes
>> Filesystem UUID: c97362e1-11db-4ff3-ba62-ede6d58884b9
>> Superblock backups stored on blocks:
>> 32768, 98304, 163840, 229376
>>
>> Allocating group tables: done
>> Writing inode tables: done
>> Creating journal (8192 blocks): done
>> Writing superblocks and filesystem accounting information: done
>> root@ceph02-r2b-fl1:~# mount ${RBD_DEVICE} /media/test
>>
>> Let's list current mapped devices.
>>
>>
>>
>>
>>
>>
>>
>> *root@ceph02-r2b-fl1:~# mount | grep rbd/dev/rbd4 on
>> /var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd5 on
>> /var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd2 on
>> /var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd1 on
>> /var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd3 on
>> /var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd0 on
>> /var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd6 on /media/test type ext4
>> (rw,relatime,stripe=16)*
>>
>>
============================================================================================
>> ===============> Adding the new OSD (an old small disk to be faster...)
>> <===================
>> =========================== (via service task in dashboard)
>> ================================
>>
>>
============================================================================================
>> root@ceph02-r2b-fl1.ep-ws.fr.eholab.admin:~# ceph osd tree
>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>> -1 53.25822 root default
>> ...
>> -5 3.77229 host ceph02-r2b-fl1
>> 6 hdd 0.90970 osd.6 up 1.00000 1.00000
>> 9 hdd 0.90970 osd.9 up 1.00000 1.00000
>> * 26 hdd 0.13350 osd.26 up 1.00000
>> 1.00000 <=== Here's the brand new OSD*
>> 2 nvme 0.90970 osd.2 up 1.00000 1.00000
>> 4 nvme 0.90970 osd.4 up 1.00000 1.00000
>> ....
>> ============ Let's check what is NS contains ... =======================
>>
>>
>> root@ceph02-r2b-fl1:~# podman ps
>> CONTAINER ID IMAGE
>> COMMAND
>> CREATED STATUS PORTS NAMES
>> .....
>> 6dbd4fd4e4f3
cephpodregistry:5000/ceph@sha256:e205163225ec8ce460d6581df66ba4866585e3a4817866910f85bedcdcff7935
>> -n osd.26 -f --se... 2 minutes ago Up 2 minutes ago
>> ceph-c3f59906-c43d-11ee-a2d6-3a82cb8036b6-osd-26
>>
>>
>>
>>
>>
>>
>>
>>
>> *root@ceph02-r2b-fl1.ep-ws.fr.eholab.admin:~# podman exec 6dbd4fd4e4f3
>> mount | grep rbd/dev/rbd4 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd5 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd2 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd1 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd3 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd0 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-xx type ext4
>> (rw,relatime,discard,stripe=16)/dev/rbd6 on /rootfs/media/test type ext4
>> (rw,relatime,stripe=16)======= of course these mountpoints are not needed
>> by the newly created OSD ... and this full copy is problematic
>> ============*
>>
>> And then :
>>
>> root@ceph02-r2b-fl1:~# umount /dev/rbd6
>> -I can unmount the rbd device ... no reference is host NS-
>>
>>
>> *root@ceph02-r2b-fl1:~# rbd unmap /dev/rbd6rbd: sysfs write failedrbd:
>> unmap failed: (16) Device or resource busy*
>>
>>
>>
>>
>>
>>
>>
>> *root@ceph02-r2b-fl1:~# podman exec 6dbd4fd4e4f3 mount | grep
>> rbd/dev/rbd4 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-06c995c3 type
>> ext4 (rw,relatime,discard,stripe=16)/dev/rbd5 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-0bc99da2 type
>> ext4 (rw,relatime,discard,stripe=16)/dev/rbd2 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-62cc652e type
>> ext4 (rw,relatime,discard,stripe=16)/dev/rbd1 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-5dcc5d4f type
>> ext4 (rw,relatime,discard,stripe=16)/dev/rbd3 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-efd3feea type
>> ext4 (rw,relatime,discard,stripe=16)/dev/rbd0 on
>> /rootfs/var/lib/incus/storage-pools/default/containers/ec-59cc5703 type
>> ext4 (rw,relatime,discard,stripe=16)/dev/rbd6 on /rootfs/media/test type
>> ext4 (rw,relatime,stripe=16)*
>> ... Because OSD's Namespace still had the mount (as all rbd mapped...)
>>
>> Hope someone could create a ticket for this bug in ceph bug tracker. I
>> didn't find how to create a ticket without being "a member".
>>
>> Regards
>>
>> Nicolas FOURNIL
>>
https://www.eho.link
>>
>> _______________________________________________
>> Dev mailing list -- dev(a)ceph.io
>> To unsubscribe send an email to dev-leave(a)ceph.io
>>
>