On Thu, May 27, 2021 at 9:58 PM John Spray <jcspray(a)gmail.com> wrote:
Hey,
Hope you don't mind me chiming in, as someone responsible for some of the mess :-)
Testing my memory a bit but... sub-interpreters were originally used because some badly
behaved dependencies (I think it was one of the web server libraries?) used global mutable
state on their own module, and thereby caused a problem if two mgr modules were both using
that dependency.
Using sub-interpreters put us on the fringes of CPython use cases, which isn't a
great place to be. The other option is to use separate processes rather than multiple
interpreters in the same process. At the time, that seemed too expensive (in terms of
developer time), although process separation would be the preferable level of separation
for functionally distinct units.
Some thoughts on options:
A) Isolate modules in processes, tethered to a central ceph-mgr process that provides a
new RPC interface that mimics the existing MgrModule interface. Lots of work. Will
introduce substantial runtime overhead.
B) Isolate modules in processes, where each is a first class RADOS client -- basically
run N ceph-mgr daemons, each hosting a particular module (or group of modules, to reduce
overhead for tiny modules). Much less work, but would need careful design to avoid adding
user-facing complexity in managing many daemons.
C) Look at alternate python interpreters that might provide cleaner sandboxing than
CPython (I am not up to date on the python world but perhaps something exists).
in future, probably we could compile the python modules into web
assembly modules and run them in a different interpreter which can be
retargeted to Wasm. so we can write the mgr modules in languages like
Scheme and Rust!
D) Stop using sub-interpreters, and forbid any python dependencies that use global state
that could conflict between modules. Basically any time two modules use the same
dependency, it needs some level of audit to ensure they aren't going to collide.
i think this might be a more promising solution. but probably we need
to understand which dependency was using global mutable, and made us
use sub-interpreters. and how. was it cherrypy? it is used by restful
and dashboard. and seems it was causing troubles. see
https://github.com/ceph/ceph/pull/14971
E) Re-write problematic mgr modules to C++. Perhaps if a module is numerically intensive
enough to need numpy then it might be better off as native code to begin with. C++ with
all the modern features is even quite a nice language :-)
Sorry for the wall of text, hope that's some help.
John
On Wed, May 26, 2021 at 1:56 PM Sebastian Wagner <sewagner(a)redhat.com> wrote:
All,
We have two issues:
*
https://tracker.ceph.com/issues/45574
*
https://tracker.ceph.com/issues/48787
Caused by numpy not supporting Python sub-interpreters. Unfortunately, the
latter issue came up in the most recent Octopus validations. I suspect
it's just
a matter of time, till our users are affected by it.
Note that removing numpy is not easy, as kuberenetes-client depends
(transitively) on numpy.
Thoughts?
- Sebastan
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io
--
Regards
Kefu Chai