On Sat, Jul 27, 2019 at 5:26 AM Sam Just <sjust(a)redhat.com> wrote:
- to run bluestore in the same process of
crimson-osd, but we will
allocate some dedicated threads (and CPU cores) to it. we could use
ceph::thread::ThreadPool for this purpose. for instance, we will have
3 ConfigProxy backends.
1. the classic ConfigProxy used by classic OSD and other daemons and
command line utilities. the ConfigProxy normally resides in a global
CephContext.
2. the ceph::common::ConfigProxy solely used by crimson OSD. it is
rewritten using seastar. it's a sharded service. normally we just
access the config proxy directly in crimson, like
'local_conf().get_val< uint64_t>("name")' instead of using
something
like 'cct->_conf.get_val<uint64_t>("name")'
3. the ConfigProxy used by bluestore living in the alien world. its
interface will be exactly the same as the classic one, but it will
call into its crimson counterpart using the `seastar::alien::submit()`
call.
I'm not sure this is quite right. I think that the seastar config
would have a reference over to the alien config machinery in order to
inject config changes and do the initial setup, but the alien side
needn't have a reference to the crimson one.
i was thinking about the implementation of ConfigProxy::get_val<>().
but yeah, if we 1) have a separated copy of ConfigValue on the alien
side, 2) let the alien side work in the passive mode, and 3) use the
ThreadPool::submit() to inject config changes into alien's
ConfigProxy, what'd be a lot easier.
in addition to WITH_SEASTAR macro, we can
introduce yet another
macro allowing us to call into the facilities offered by
crimson-common. and we can use inline namespace to differentiate the
2nd from 3rd implementations. as they will need to be co-located in
the same process. and without using different names, we'd violate ODR.
- to hide bluestore in a library which links against ceph-common
library. but the libblustore won't expose any ceph-common symbols to
crimson-osd. but we need to figure out how to maintain the internal
status of ceph-common. as it not quite self-contained in the sense
that it need to access the logging, config and other facilities
offered by crimson-osd.
The library option seems promising to me if we go this direction. It
can even export an interface which is entirely agnostic of the config
machinery (maybe take a serialized representation of the config
values?) and write to a different log file at first.
yeah, probably we just need a "keyhole" for updating the alien side's
config settings. this option actually is a variant of the previous
one. the only difference is that, we need to use different namespaces
to differentiate the symbols in bluestore from those in
crimson-common.
- to port rocksdb to seastar: to be specific,
this approach will use
seastar's green thread to implement the Mutex, CondVar and Thread in
rocksdb, and implement all blocking calls using seastar's
counterparts. if this approach is proved to be workable. the next
problem would be to upstream this change. and in a long run, the
rocksdb backed bluestore will be replaced by seastore if seastore is
capable of supporting relatively slow devices as well.
I've started to look at your rocksdb port. It does look like the
parts we'd need to adapt are appropriately factored out in rocksdb,
and I bet we'd get interest from upstream. We might want to take
their temperature sooner rather than later? We'd also have to perform
good idea! will do so early tomorrow!
essentially the same refactor in Bluestore in order to
break the
bluestore logic apart from the IO/blocking/locking portions. I guess
this exists in some form with the BlockDevice interface, but we'll
also have to introduce something like rocksdb's lock replacement.
This path would get us a much more cooperative (probably more
performant as well, particularly in high density hosts) bluestore in
the long run, so it might be worth the work.
thanks. your insights are inspiring!
- seastore: a completely rewritten object store
backend targeting fast
NVMe devices. but it will take longer to get there.
I think we're going to do this no matter what. I think
alien/bluestore choice is about how we want to test crimson prior to
developing seastore and possibly for handling devices inappropriate
for seastore?
that's also my impression. the way how i see it is just because we
haven't started scoping it or had a low level design.
-Sam
--
Regards
Kefu Chai