mgr, numpy and sub-interpreters

List overview All Threads
Download

newer

older

No Orchestrator Meeting today

05/20/2021 perf meeting is on!

Sebastian Wagner

26 May 2021 26 May '21

6:26 p.m.

All, We have two issues: * https://tracker.ceph.com/issues/45574 * https://tracker.ceph.com/issues/48787 Caused by numpy not supporting Python sub-interpreters. Unfortunately, the latter issue came up in the most recent Octopus validations. I suspect it's just a matter of time, till our users are affected by it. Note that removing numpy is not easy, as kuberenetes-client depends (transitively) on numpy. Thoughts? - Sebastan

Attachments:

OpenPGP_signature.sig (application/pgp-signature — 495 bytes)
OpenPGP_0x8D2442807E6979F8.asc (application/pgp-keys — 2.2 KB)

Show replies by date

John Spray

27 May 27 May

7:22 p.m.

Hey, Hope you don't mind me chiming in, as someone responsible for some of the mess :-) Testing my memory a bit but... sub-interpreters were originally used because some badly behaved dependencies (I think it was one of the web server libraries?) used global mutable state on their own module, and thereby caused a problem if two mgr modules were both using that dependency. Using sub-interpreters put us on the fringes of CPython use cases, which isn't a great place to be. The other option is to use separate processes rather than multiple interpreters in the same process. At the time, that seemed too expensive (in terms of developer time), although process separation would be the preferable level of separation for functionally distinct units. Some thoughts on options: A) Isolate modules in processes, tethered to a central ceph-mgr process that provides a new RPC interface that mimics the existing MgrModule interface. Lots of work. Will introduce substantial runtime overhead. B) Isolate modules in processes, where each is a first class RADOS client -- basically run N ceph-mgr daemons, each hosting a particular module (or group of modules, to reduce overhead for tiny modules). Much less work, but would need careful design to avoid adding user-facing complexity in managing many daemons. C) Look at alternate python interpreters that might provide cleaner sandboxing than CPython (I am not up to date on the python world but perhaps something exists). D) Stop using sub-interpreters, and forbid any python dependencies that use global state that could conflict between modules. Basically any time two modules use the same dependency, it needs some level of audit to ensure they aren't going to collide. E) Re-write problematic mgr modules to C++. Perhaps if a module is numerically intensive enough to need numpy then it might be better off as native code to begin with. C++ with all the modern features is even quite a nice language :-) Sorry for the wall of text, hope that's some help. John On Wed, May 26, 2021 at 1:56 PM Sebastian Wagner <sewagner(a)redhat.com> wrote:

...

kefu chai

28 May 28 May

8:31 a.m.

On Thu, May 27, 2021 at 9:58 PM John Spray <jcspray(a)gmail.com> wrote:

...

in future, probably we could compile the python modules into web assembly modules and run them in a different interpreter which can be retargeted to Wasm. so we can write the mgr modules in languages like Scheme and Rust!

...

D) Stop using sub-interpreters, and forbid any python dependencies that use global state that could conflict between modules. Basically any time two modules use the same dependency, it needs some level of audit to ensure they aren't going to collide.

i think this might be a more promising solution. but probably we need to understand which dependency was using global mutable, and made us use sub-interpreters. and how. was it cherrypy? it is used by restful and dashboard. and seems it was causing troubles. see https://github.com/ceph/ceph/pull/14971

...

E) Re-write problematic mgr modules to C++. Perhaps if a module is numerically intensive enough to need numpy then it might be better off as native code to begin with. C++ with all the modern features is even quite a nice language :-) Sorry for the wall of text, hope that's some help. John On Wed, May 26, 2021 at 1:56 PM Sebastian Wagner <sewagner(a)redhat.com> wrote:

_______________________________________________ Dev mailing list -- dev(a)ceph.io To unsubscribe send an email to dev-leave(a)ceph.io

-- Regards Kefu Chai

Sebastian Wagner

31 May 31 May

3:06 p.m.

Am 28.05.21 um 05:01 schrieb kefu chai:

...

On Thu, May 27, 2021 at 9:58 PM John Spray <jcspray(a)gmail.com> wrote: > Hey, > > Hope you don't mind me chiming in, as someone responsible for some of the mess :-)

Hi John, thanks for your reply! Nice to see you here.

...

> Testing my memory a bit but... sub-interpreters were originally used because some badly behaved dependencies (I think it was one of the web server libraries?) used global mutable state on their own module, and thereby caused a problem if two mgr modules were both using that dependency.

Right, except that this doesn't help with global mutable state in native modules, like numpy.

...

> Using sub-interpreters put us on the fringes of CPython use cases, which isn't a great place to be. The other option is to use separate processes rather than multiple interpreters in the same process. At the time, that seemed too expensive (in terms of developer time), although process separation would be the preferable level of separation for functionally distinct units. > > Some thoughts on options: > A) Isolate modules in processes, tethered to a central ceph-mgr process that provides a new RPC interface that mimics the existing MgrModule interface. Lots of work. Will introduce substantial runtime overhead. > > B) Isolate modules in processes, where each is a first class RADOS client -- basically run N ceph-mgr daemons, each hosting a particular module (or group of modules, to reduce overhead for tiny modules). Much less work, but would need careful design to avoid adding user-facing complexity in managing many daemons.

That sounds complicated for non-containerized deployments. For Rook and cephadm, this should be doable.

...

C) Look at alternate python interpreters that might provide cleaner sandboxing than CPython (I am not up to date on the python world but perhaps something exists).

Good luck compiling native Python modules to web assembly :-P . Seriously, we'd loose a lot of compatibility with the Python ecosystem.

...

Maybe we could think of having a shared cherrypy instance that delegates request handling with WSGI.

...

> E) Re-write problematic mgr modules to C++. Perhaps if a module is numerically intensive enough to need numpy then it might be better off as native code to begin with. C++ with all the modern features is even quite a nice language :-)

The dependency chain looks like so: mgr/rook -> kubernetes -> websocket -> numpy. I'd like to avoid rewriting mgr/rook in C++ :-)

...

Sorry for the wall of text, hope that's some help. John On Wed, May 26, 2021 at 1:56 PM Sebastian Wagner <sewagner(a)redhat.com> wrote:

_______________________________________________ Dev mailing list -- dev(a)ceph.io To unsubscribe send an email to dev-leave(a)ceph.io

Ernesto Puerta

10:04 p.m.

On the longer term solution I'd definitely go with decoupling the server-client mgr API with something like HTTP+JSON, gRPC, ... On the shorter term front: It seems that Python 3.10 (to be released later this year) will (experimentally) feature something called "isolated subinterpreters" <https://vstinner.github.io/isolate-subinterpreters.html>. That said, its list of caveats definitely keeps it far away from production readiness (e.g.: GC and many optimizations disabled). However, there's an interesting note there: the way it does for instantiating modules, multi-phase initialization (PEP 489), is available since Python 3.5 <https://www.python.org/dev/peps/pep-0489/>. While module globals are not fully isolated, it seems that at least it's adding some improved isolation compared to the regular shallow copy of the PyModule_Create() <https://docs.python.org/3/c-api/init.html#c.Py_NewInterpreter>. Another (lazy/desperate) alternative might be trying to import numpy from the main/global interpreter and seeing if that improves the situation? Kind Regards, Ernesto On Mon, May 31, 2021 at 11:37 AM Sebastian Wagner <sewagner(a)redhat.com> wrote:

...

Am 28.05.21 um 05:01 schrieb kefu chai:

On Thu, May 27, 2021 at 9:58 PM John Spray <jcspray(a)gmail.com> wrote: > Hey, > > Hope you don't mind me chiming in, as someone responsible for some of

the mess :-) Hi John, thanks for your reply! Nice to see you here.

> Testing my memory a bit but... sub-interpreters were originally used

because some badly behaved dependencies (I think it was one of the web server libraries?) used global mutable state on their own module, and thereby caused a problem if two mgr modules were both using that dependency. Right, except that this doesn't help with global mutable state in native modules, like numpy.

> Using sub-interpreters put us on the fringes of CPython use cases,

which isn't a great place to be. The other option is to use separate processes rather than multiple interpreters in the same process. At the time, that seemed too expensive (in terms of developer time), although process separation would be the preferable level of separation for functionally distinct units.

> > Some thoughts on options: > A) Isolate modules in processes, tethered to a central ceph-mgr process

that provides a new RPC interface that mimics the existing MgrModule interface. Lots of work. Will introduce substantial runtime overhead.

> > B) Isolate modules in processes, where each is a first class RADOS

client -- basically run N ceph-mgr daemons, each hosting a particular module (or group of modules, to reduce overhead for tiny modules). Much less work, but would need careful design to avoid adding user-facing complexity in managing many daemons. That sounds complicated for non-containerized deployments. For Rook and cephadm, this should be doable.

> > C) Look at alternate python interpreters that might provide cleaner

sandboxing than CPython (I am not up to date on the python world but perhaps something exists).

Good luck compiling native Python modules to web assembly :-P . Seriously, we'd loose a lot of compatibility with the Python ecosystem.

> D) Stop using sub-interpreters, and forbid any python dependencies that

use global state that could conflict between modules. Basically any time two modules use the same dependency, it needs some level of audit to ensure they aren't going to collide.

Maybe we could think of having a shared cherrypy instance that delegates request handling with WSGI.

> E) Re-write problematic mgr modules to C++. Perhaps if a module is

numerically intensive enough to need numpy then it might be better off as native code to begin with. C++ with all the modern features is even quite a nice language :-) The dependency chain looks like so: mgr/rook -> kubernetes -> websocket -> numpy. I'd like to avoid rewriting mgr/rook in C++ :-)

> > Sorry for the wall of text, hope that's some help. > > John > > > > > > > > > > On Wed, May 26, 2021 at 1:56 PM Sebastian Wagner <sewagner(a)redhat.com>

wrote:

>> All, >> >> >> We have two issues: >> >> * https://tracker.ceph.com/issues/45574 >> * https://tracker.ceph.com/issues/48787 >> >> Caused by numpy not supporting Python sub-interpreters. Unfortunately,

the

latter issue came up in the most recent Octopus validations. I suspect it's just a matter of time, till our users are affected by it. Note that removing numpy is not easy, as kuberenetes-client depends (transitively) on numpy. Thoughts? - Sebastan _______________________________________________ Dev mailing list -- dev(a)ceph.io To unsubscribe send an email to dev-leave(a)ceph.io

_______________________________________________ Dev mailing list -- dev(a)ceph.io To unsubscribe send an email to dev-leave(a)ceph.io

1061

days inactive

1066

days old

dev@ceph.io

Manage subscription

4 comments

4 participants

tags (0)

participants (4)

Ernesto Puerta
John Spray
kefu chai
Sebastian Wagner