On Fri, Dec 06, 2019 at 01:31:05PM +0000, Sage Weil wrote:
My thoughts here are still pretty inconclusive...
I agree that we should invest a non-LVM mode, but there isn't a way to do
that currently that supports dm-crypt that isn't complicated and
convoluted, so it cannot be a full replacement for the LVM mode.
At the same time, Real Soon Now we're going to be building crimson OSDs
backed by ZNS SSDs (and eventually persistent memory), which will also
very clearly not be LVM-based. I'm a bit hesitant to introduce a
bare-bones bluestore mode right now just because we'll be adding yet
another variation soon, and it may be that we construct a general approach
to both... but probably not. And the whole point of c-v's architecture
was to be pluggable.
So maybe a bare-bones bluestore mode makes sense. In the simple case, it
really should be *very* simple. But its scope pretty quickly expodes:
what about wal and db devices? We have labels for those, so we could
support those, also easily... if the user has to partition the devices
beforehand manually. They'll immediately want to use the new
auto/batch thing, but that's tied to the LVM implementation. And what
if one of the db/wal/main devices is an LV and another is not? We'd
need to make sure the lvm mode machinery doesn't trigger unless all of
its labels are there, but it might be confusing. All of which means that
this is probably *only* useful for single-device OSDs. On the one hand,
those are increasingly common (hello, all-SSD clusters), but on the other
hand, for fast SSDs we may want to deploy N of them per device.
I don't think keeping a simple or barebones approach will survive contact with
real-world deployments. Imho if we want a raw mode, we better be prepared to
deal with multi-device OSDs and multi-OSD devices and the partitioning this
requires.
Since we can't cover all of that, and at a minimum, we can't cover
dm-crypt, Rook will need to behave with the lvm mode one way or another.
So we need to have a wrapper (or something similar) no matter what. So I
suggest we start there.
Agreed.
sage
On Fri, 6 Dec 2019, Sebastien Han wrote:
> Hi Kai,
>
> Thanks!
> –––––––––
> Sébastien Han
> Senior Principal Software Engineer, Storage Architect
>
> "Always give 100%. Unless you're giving blood."
>
> On Fri, Dec 6, 2019 at 10:44 AM Kai Wagner <kwagner(a)suse.com> wrote:
> >
> > Hi Sebastien and thanks for your feedback.
> >
> > On 06.12.19 10:00, Sebastien Han wrote:
> > > ceph-volume is a sunk cost!
> > > And your argument basically falls into that paradigm, "oh we have
> > > invested so much already, that we cannot stop and we should continue
> > > even though this will only bring more trouble". Incapable of
accepting
> > > this sunk cost.
> > > All the issues that have been fixed with a lot of pain.
> > > All that pain could have been avoided if LVM wasn't there and
pursuing
> > > in that direction will only lead us to more pain again.
> >
> > The reason I disagree here is the scenario were the WAL/DB is on a
> > separate device and a single OSD crashes. In that case you would like to
> > recreate just that single OSD instead of the whole group. Also if we
> > deprecate a tool such like we did with ceph-disk, users have to migrate
> > sooner or later if they don't want to do everything manually on the CLI
> > (by that I mean via fdisk/pure lvm commands and so on).
> >
> > We could argue now that this can still be done on the command line
> > manually but all our efforts are towards simplicity/automation and
> > having everything in the Dashboard. If the underlying tool/functionality
> > isn't there anymore, that isn't possible.
> >
>
> I understand your position, yes when we start separating block/db/wal
> things get really complex that's why I'm sticking with block/db/wal in
> the same block.
> Also, we haven't seen any request for separating those when running
> OSDs on PVC in the Cloud. So we would likely continue to do so for a
> while.
>
> > > Also, I'm not saying we should replace the tool but allow not using
> > > LVM for a simple scenario to start with
> >
> > Which then leads me to, why couldn't such functionality be implemented
> > into a single tool instead of having two at the end?
> >
> > So don't get me wrong, I'm not saying that I'm against everything
I'm
> > just saying that I think this is a topic that should be discussed in
> > more depth.
>
> Yes, that's for sure.
>
> >
> > As said, just my two cents here.
> >
> > Kai
> >
> > --
> > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg
> > GF:Geschäftsführer: Felix Imendörffer, (HRB 36809, AG Nürnberg)
> >
> >
> _______________________________________________
> Dev mailing list -- dev(a)ceph.io
> To unsubscribe send an email to dev-leave(a)ceph.io
>
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io
--
Jan Fajerski
Senior Software Engineer Enterprise Storage
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer