Hey,
Hope you don't mind me chiming in, as someone responsible for some of the
mess :-)
Testing my memory a bit but... sub-interpreters were originally used
because some badly behaved dependencies (I think it was one of the web
server libraries?) used global mutable state on their own module, and
thereby caused a problem if two mgr modules were both using that dependency.
Using sub-interpreters put us on the fringes of CPython use cases, which
isn't a great place to be. The other option is to use separate processes
rather than multiple interpreters in the same process. At the time, that
seemed too expensive (in terms of developer time), although process
separation would be the preferable level of separation for functionally
distinct units.
Some thoughts on options:
A) Isolate modules in processes, tethered to a central ceph-mgr process
that provides a new RPC interface that mimics the existing MgrModule
interface. Lots of work. Will introduce substantial runtime overhead.
B) Isolate modules in processes, where each is a first class RADOS client
-- basically run N ceph-mgr daemons, each hosting a particular module (or
group of modules, to reduce overhead for tiny modules). Much less work,
but would need careful design to avoid adding user-facing complexity in
managing many daemons.
C) Look at alternate python interpreters that might provide cleaner
sandboxing than CPython (I am not up to date on the python world but
perhaps something exists).
D) Stop using sub-interpreters, and forbid any python dependencies that use
global state that could conflict between modules. Basically any time two
modules use the same dependency, it needs some level of audit to ensure
they aren't going to collide.
E) Re-write problematic mgr modules to C++. Perhaps if a module is
numerically intensive enough to need numpy then it might be better off as
native code to begin with. C++ with all the modern features is even quite
a nice language :-)
Sorry for the wall of text, hope that's some help.
John
On Wed, May 26, 2021 at 1:56 PM Sebastian Wagner <sewagner(a)redhat.com>
wrote:
All,
We have two issues:
*
https://tracker.ceph.com/issues/45574
*
https://tracker.ceph.com/issues/48787
Caused by numpy not supporting Python sub-interpreters. Unfortunately, the
latter issue came up in the most recent Octopus validations. I suspect
it's just
a matter of time, till our users are affected by it.
Note that removing numpy is not easy, as kuberenetes-client depends
(transitively) on numpy.
Thoughts?
- Sebastan
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io