Hi Jeff,
Ran into the very same issue. Filed a bug-report at
https://tracker.ceph.com/issues/45574
ceph 15.2.1 on up-to-date Debian Buster.
TL;DR: The way ceph-mgr-rook's RookOrchestrator class interacts with
python3-numpy package is borked.
Result: Cluster cannot start, since 'devicehealth' plugin of
ceph-mgr is an always-on module.
Best,
Martin
On Mon, May 04, 2020 at 02:10:58AM -0700, Jeff Welling wrote:
Hello my ceph-using comrades!
I've been using ceph for awhile at home but wanted to update to the
latest, Octopus. I got it installed on a single node, added a second
node and some OSDs, and have been migrating from my original Jewel
cluster. When I installed Octopus, I wiped the systems and installed
Debian Buster, added the ceph apt repos instead of using the packages in
Debian, and installed manually using ceph-vol to create Bluestore OSDs,
using the ceph docs as my guide.
Now though, one of the two Octopus nodes (the one running ceph-mgr and
ceph-mon) are crashing weekly. I haven't been able to look into the
cause of the crashes in detail yet as these are hobbyist systems and
work has been exceptionally busy lately, but now after the most recent
crash, I'm unable to start ceph-mgr and the syslog has ceph-mgr messages
complaining of not being able to find the 'rook' module. This is rather
confusing because though I'm aware of rook, to my knowledge I've never
used it on my systems, and there's no mention of it in my config.
I tried applying pending upgrades but that hasn't changed the behavior.
I normally wouldn't dare ask for help this early in my adventure but I
find myself in a bit of a pinch. By any chance have you hit this before,
or know what may be causing it?
Ceph is awesome. Keep up the good work, stay safe, and Thank You Kindly
in advance!
My ceph version
root@zim:~# ceph --version
ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee)
octopus (stable)
This is my ceph.config
[global]
fsid = 495d7f30-CCCC-BBBB-AAAA-ddf6ffe063d0
mon initial members = zim.internal.justdev.ca
mon host = 192.168.0.11
public network = 192.168.0.0/24
cluster network = 192.168.42.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 333
osd pool default pgp num = 333
osd crush chooseleaf type = 1
rbd_default_features = 7
These are the syslogs that show up when trying to restart ceph-mgr
May 4 01:35:13 zim ceph-mgr[21602]: 2020-05-04T01:35:13.065-0700
7fcdccaa2f40 -1 mgr[py] Module not found: 'rook'
May 4 01:35:13 zim ceph-mgr[21602]: 2020-05-04T01:35:13.065-0700
7fcdccaa2f40 -1 mgr[py] Traceback (most recent call last):
May 4 01:35:13 zim ceph-mgr[21602]: File
"/usr/share/ceph/mgr/rook/__init__.py", line 2, in <module>
May 4 01:35:13 zim ceph-mgr[21602]: from .module import
RookOrchestrator
May 4 01:35:13 zim ceph-mgr[21602]: File
"/usr/share/ceph/mgr/rook/module.py", line 16, in <module>
May 4 01:35:13 zim ceph-mgr[21602]: from kubernetes import
client, config
May 4 01:35:13 zim ceph-mgr[21602]: File
"/lib/python3/dist-packages/kubernetes/__init__.py", line 22, in
<module>
May 4 01:35:13 zim ceph-mgr[21602]: import kubernetes.stream
May 4 01:35:13 zim ceph-mgr[21602]: File
"/lib/python3/dist-packages/kubernetes/stream/__init__.py", line 15,
in <module>
May 4 01:35:13 zim ceph-mgr[21602]: from .stream import stream
May 4 01:35:13 zim ceph-mgr[21602]: File
"/lib/python3/dist-packages/kubernetes/stream/stream.py", line 13,
in <module>
May 4 01:35:13 zim ceph-mgr[21602]: from . import ws_client
May 4 01:35:13 zim ceph-mgr[21602]: File
"/lib/python3/dist-packages/kubernetes/stream/ws_client.py", line
19, in <module>
May 4 01:35:13 zim ceph-mgr[21602]: from websocket import
WebSocket, ABNF, enableTrace
May 4 01:35:13 zim ceph-mgr[21602]: File
"/lib/python3/dist-packages/websocket/__init__.py", line 22, in
<module>
May 4 01:35:13 zim ceph-mgr[21602]: from ._abnf import *
May 4 01:35:13 zim ceph-mgr[21602]: File
"/lib/python3/dist-packages/websocket/_abnf.py", line 34, in
<module>
May 4 01:35:13 zim ceph-mgr[21602]: import numpy
May 4 01:35:13 zim ceph-mgr[21602]: File
"/lib/python3/dist-packages/numpy/__init__.py", line 142, in
<module>
May 4 01:35:13 zim ceph-mgr[21602]: from . import core
May 4 01:35:13 zim ceph-mgr[21602]: File
"/lib/python3/dist-packages/numpy/core/__init__.py", line 40, in
<module>
May 4 01:35:13 zim ceph-mgr[21602]: from . import multiarray
May 4 01:35:13 zim ceph-mgr[21602]: File
"/lib/python3/dist-packages/numpy/core/multiarray.py", line 12, in
<module>
May 4 01:35:13 zim ceph-mgr[21602]: from . import overrides
May 4 01:35:13 zim ceph-mgr[21602]: File
"/lib/python3/dist-packages/numpy/core/overrides.py", line 65, in
<module>
May 4 01:35:13 zim ceph-mgr[21602]: """)
May 4 01:35:13 zim ceph-mgr[21602]: RuntimeError:
_get_implementing_args method already has a docstring
May 4 01:35:13 zim ceph-mgr[21602]: 2020-05-04T01:35:13.069-0700
7fcdccaa2f40 -1 mgr[py] Class not found in module 'rook'
May 4 01:35:13 zim ceph-mgr[21602]: 2020-05-04T01:35:13.069-0700
7fcdccaa2f40 -1 mgr[py] Error loading module 'rook': (2) No such
file or directory
May 4 01:35:13 zim ceph-mgr[21602]: 2020-05-04T01:35:13.673-0700
7fcdccaa2f40 -1 log_channel(cluster) log [ERR] : Failed to load
ceph-mgr modules: rook
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io