Re: [RFE] ceph-volume prepare and activate enhancements for containers

6 Dec 2019

Cool, that works for me!
Thanks!
–––––––––
Sébastien Han
Senior Principal Software Engineer, Storage Architect

"Always give 100%. Unless you're giving blood."

On Fri, Dec 6, 2019 at 3:03 PM Sage Weil &lt;sweil(a)redhat.com&gt; wrote:
...

 On Fri, 6 Dec 2019, Sebastien Han wrote:
  If not in ceph-osd, can we have the ceph-osd
executing a hook before exiting 0?
 Reading a hook script from /etc/ceph/hook.d something like that would
 be nice so that we don't need a wrapper. 
 Hmm, maybe if it was just osd_exec_on_shutdown=string, and that could
 be something like "vgchange ..." or "bash -c ..."?  We'd need to
make
 sure we're setting FD_CLOEXEC on all the right file handles though.  I can
 give it a go..

 sage

 >
 > Thoughts?
 >
 > Thanks!
 > –––––––––
 > Sébastien Han
 > Senior Principal Software Engineer, Storage Architect
 >
 > "Always give 100%. Unless you're giving blood."
 >
 > On Fri, Dec 6, 2019 at 2:50 PM Sage Weil &lt;sweil(a)redhat.com&gt; wrote:
 > >
 > > On Fri, 6 Dec 2019, Sebastien Han wrote:
 > > > I understand this is asking a lot from the ceph-volume side.
 > > > We can explore a new wrapper binary or perhaps from the ceph-osd itself.
 > > >
 > > > Maybe crazy/stupid idea, can we have a de-activate call from the osd
 > > > process itself? ceph-osd gets SIGTERM, closes the connection to the
 > > > device, then runs "vgchange -an <vg>", is this realistic?
 > >
 > > Not really... it's hard (or gross) to do a hard/immediate exit that tears
 > > down all of the open handles to the device.  I think this is not a nice
 > > way to layer things.  I'd prefer either a c-v command or separate wrapper
 > > script to this.
 > >
 > > sage
 > >
 > >
 > > >
 > > > Thanks!
 > > > –––––––––
 > > > Sébastien Han
 > > > Senior Principal Software Engineer, Storage Architect
 > > >
 > > > "Always give 100%. Unless you're giving blood."
 > > >
 > > > On Fri, Dec 6, 2019 at 1:44 PM Alfredo Deza &lt;adeza(a)redhat.com&gt;
wrote:
 > > > >
 > > > > On Fri, Dec 6, 2019 at 5:59 AM Sebastien Han &lt;shan(a)redhat.com&gt;
wrote:
 > > > > >
 > > > > > Hi,
 > > > > >
 > > > > > Following up on my previous ceph-volume email as promised.
 > > > > >
 > > > > > When running Ceph with Rook in Kubernetes in the Cloud (Aws,
Azure,
 > > > > > Google, whatever), the OSDs are backed by PVC (Cloud block
storage)
 > > > > > attached to virtual machines.
 > > > > > This makes the storage portable if the VM dies, the device will
be
 > > > > > attached to a new virtual machine and the OSD will resume
running.
 > > > > >
 > > > > > In Rook, we have 2 main deployments for the OSD:
 > > > > >
 > > > > > 1. Prepare the disk to become an OSD
 > > > > > Prepare will run on the VM, attach the block device, run
"ceph-volume
 > > > > > prepare", then this gets complicated. After this, the
device is
 > > > > > supposed to be detached from the VM because the container
terminated.
 > > > > > However, the block is still held by LVM so the VG must be
 > > > > > de-activated. Currently, we do this in Rook, but it would be
nice to
 > > > > > de-activate the VG once ceph-volume is done preparing the disk
in a
 > > > > > container.
 > > > > >
 > > > > > 2. Activate the OSD.
 > > > > > Now, onto the new container, the device is attached again on the
VM.
 > > > > > At this point, more changes will be required in ceph-volume,
 > > > > > particularly in the "activate" call.
 > > > > >   a. ceph-volume should activate the VG
 > > > >
 > > > > By VG you mean LVM's Volume Group?
 > > > >
 > > > > >   b. ceph-volume should activate the device normally
 > > > >
 > > > > Not "normally" though right? That would imply starting the
OSD which
 > > > > you are indicating is not desired.
 > > > >
 > > > > >   c. ceph-volume should run the ceph-osd process in foreground
as well
 > > > > > as accepting flag to that CLI, we could have something like:
 > > > > > "ceph-volume lvm activate --no-systemd $STORE_FALG $OSD_ID
$OSD_UUID
 > > > > > <a bunch of flags>"
 > > > > >   Perhaps we need a new flag to indicate we want to run the osd
 > > > > > process in foreground?
 > > > > >   Here is an example on how an OSD run today:
 > > > > >
 > > > > >   ceph-osd --foreground --id 2 --fsid
 > > > > > 9a531951-50f2-4d48-b012-0aef0febc301 --setuser ceph --setgroup
ceph
 > > > > > --crush-location=root=default host=minikube
--default-log-to-file
 > > > > > false --ms-learn-addr-from-peer=false
 > > > > >
 > > > > >   --> we can have a bunch of flags or an ENV var with all the
flags
 > > > > > whatever you prefer.
 > > > > >
 > > > > >   This wrapper should watch for signals too, it should reply to
 > > > > > SIGTERM in the following way:
 > > > > >     - stop the OSD
 > > > > >     - de-activate the VG
 > > > > >     - exit 0
 > > > > >
 > > > > > Just a side note, the VG must be de-activated when the container
stops
 > > > > > so that the block device can be detached from the VMs,
otherwise,
 > > > > > it'll still be held by LVM.
 > > > >
 > > > > I am worried that this goes beyond what I consider the scope of
 > > > > ceph-volume which is: prepare device(s) to be part of an OSD.
 > > > >
 > > > > Catching signals, handling the OSD in the foreground, and accepting
 > > > > (proxying) flags, sounds problematic for a robust implementation in
 > > > > ceph-volume, even
 > > > > if that means it will help Rook in this case.
 > > > >
 > > > > The other challenge I see is that it seems Ceph is in a transition
 > > > > from being a baremetal project to a container one, except lots of
 > > > > tooling (like ceph-volume) is deeply
 > > > > tied to the non-containerized workflows. This makes it difficult
(and
 > > > > non-obvious!) in ceph-volume when adding more flags to do things
that
 > > > > help the containerized
 > > > > deployment.
 > > > >
 > > > > To solve the issues you describe, I think you need either a separate
 > > > > command-line tool that can invoke ceph-volume with the added
features
 > > > > you listed, or
 > > > > if there is significant push to get more things in ceph-volume, a
 > > > > separate sub-command, so that the `lvm` is isolated from the
 > > > > conflicting logic.
 > > > >
 > > > > My preference would be a wrapper script, separate from the Ceph
project.
 > > > >
 > > > > >
 > > > > > Hopefully, I was clear :).
 > > > > > This is just a proposal if you feel like this could be done
 > > > > > differently, feel free to suggest.
 > > > > >
 > > > > > Thanks!
 > > > > > –––––––––
 > > > > > Sébastien Han
 > > > > > Senior Principal Software Engineer, Storage Architect
 > > > > >
 > > > > > "Always give 100%. Unless you're giving blood."
 > > > > > _______________________________________________
 > > > > > Dev mailing list -- dev(a)ceph.io
 > > > > > To unsubscribe send an email to dev-leave(a)ceph.io
 > > > >
 > > > _______________________________________________
 > > > Dev mailing list -- dev(a)ceph.io
 > > > To unsubscribe send an email to dev-leave(a)ceph.io
 > > >
 > _______________________________________________
 > Dev mailing list -- dev(a)ceph.io
 > To unsubscribe send an email to dev-leave(a)ceph.io
 > 

2024

2023

2022

2021

2020

2019

Re: [RFE] ceph-volume prepare and activate enhancements for containers