Hi,
How I got here
--------------
Yesterday evening I added an OSD to my hobby system most likely using the
command:
# ceph-volume raw prepare --bluestore --data /dev/bcache0
# cephadm adopt --style legacy --name osd.20
I also used the command (after not having much luck with that, but I don't
have the specifics):
% ceph orch daemon add osd tutu:/tmp/bcache0
per
https://docs.ceph.com/en/latest/cephadm/osd/#creating-new-osds
..which I think resulted in new osd.18, putting the bcache0 inside its own
VG and its own LV.
I don't have actual log of the used command available, but I did end up with
new osds 18 and 20. First time using these command as well, my previous
ways to achieve the same were a bit more long-winded..
According to my monitoring my main issue appeared around the same time.
In this post I don't worry about the state of the OSD but only about
management.
Actual issue
------------
So when I now issue "ceph orch ls" I get the following output:
% ceph orch ls
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1204, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in
handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in
<lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in
_list_services
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in
raise_if_exception
raise e
AssertionError: not
("ceph orch ps" works fine.)
Similarly the output of "ceph -s" is:
% ceph -s
...
health: HEALTH_ERR
Module 'cephadm' has failed: 'not'
...
The relevant log from the manager, as per the mgr web interface, is:
_Promise failed Traceback (most recent call last): File
"/usr/share/ceph/mgr/orchestrator/_interface.py", line 294, in _finalize
next_result = self._on_complete(self._value) File
"/usr/share/ceph/mgr/cephadm/module.py", line 107, in <lambda> return
CephadmCompletion(on_complete=lambda _: f(*args, **kwargs)) File
"/usr/share/ceph/mgr/cephadm/module.py", line 1333, in describe_service
hosts=[dd.hostname] File
"/lib/python3.6/site-packages/ceph/deployment/service_spec.py", line 429, in
__init__ assert service_type in ServiceSpec.KNOWN_SERVICE_TYPES,
service_type AssertionError: not
I also noticed this seemingly highly relevant bit in my ceph orch ps:
NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID
CONTAINER ID
not.osd.20 tutu stopped 13h ago 14h <unknown> docker.io/ceph/ceph:v15
<unknown> <unknown>
I'm not quite sure how I ended up with that, but I wouldn't exclude operator
error :) such as entering "cephadm adopt --style legacy --name not.osd.20"
(but WHY..).
Sure enough, there is no such docker container running in the host and the
job ceph-3046312a-e453-11ea-b1f5-b42e993e47fc(a)osd.20.service has failed with
"RuntimeError: could not find osd.20 with osd_fsid
212c336a-9516-4818-aeaf-2d0c24c4ca65" (this error makes sense, as both osds
18 and 20 try to use the same bcache0, but the actual bluestore filesystem
is inside vg/lv as used by 18, whereas 20 tries to use bcache0 directly),
but as I said I won't worry about the OSD at the moment.
I tried the command "ceph orch daemon rm not.osd.20", however I'm not sure
if it even should work. It nevertheless fails the same way:
% ceph orch daemon rm not.osd.20
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1204, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in
handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in
<lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 1061, in _daemon_rm
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in
raise_if_exception
raise e
KeyError: 'not'
with the following entries in the mgr log:
5/13/21 1:26:06 PM[ERR]_Promise failed Traceback (most recent call last):
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 294, in
_finalize next_result = self._on_complete(self._value) File
"/usr/share/ceph/mgr/cephadm/module.py", line 107, in <lambda> return
CephadmCompletion(on_complete=lambda _: f(*args, **kwargs)) File
"/usr/share/ceph/mgr/cephadm/module.py", line 1515, in remove_daemons return
self._remove_daemons(args) File "/usr/share/ceph/mgr/cephadm/utils.py", line
65, in forall_hosts_wrapper return
CephadmOrchestrator.instance._worker_pool.map(do_work, vals) File
"/lib64/python3.6/multiprocessing/pool.py", line 266, in map return
self._map_async(func, iterable, mapstar, chunksize).get() File
"/lib64/python3.6/multiprocessing/pool.py", line 644, in get raise
self._value File "/lib64/python3.6/multiprocessing/pool.py", line 119, in
worker result = (True, func(*args, **kwds)) File
"/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar return
list(map(*args)) File "/usr/share/ceph/mgr/cephadm/utils.py", line 58, in
do_work return f(self, *arg) File "/usr/share/ceph/mgr/cephadm/module.py",
line 1804, in _remove_daemons return self._remove_daemon(name, host) File
"/usr/share/ceph/mgr/cephadm/module.py", line 1818, in _remove_daemon
self.cephadm_services[daemon_type].pre_remove(daemon) KeyError: 'not'
5/13/21 1:26:06 PM[ERR]executing
_remove_daemons((<cephadm.module.CephadmOrchestrator object at
0x7f1f4fec2bd0>, [('not.osd.20', 'tutu')])) failed. Traceback (most
recent
call last): File "/usr/share/ceph/mgr/cephadm/utils.py", line 58, in do_work
return f(self, *arg) File "/usr/share/ceph/mgr/cephadm/module.py", line
1804, in _remove_daemons return self._remove_daemon(name, host) File
"/usr/share/ceph/mgr/cephadm/module.py", line 1818, in _remove_daemon
self.cephadm_services[daemon_type].pre_remove(daemon) KeyError: 'not'
I tried also that "ceph orch daemon rm foo.bar.42" gives the error "Error
EINVAL: Unable to find daemon(s) ['foo.bar.42']", so it seems it processes
the actual command fine in part.
Thanks for any assistance!
--
_____________________________________________________________________
/ __// /__ ____ __ Erkki Seppälä\ \
/ /_ / // // /\ \/ / \ /
/_/ /_/ \___/ /_/\_\(a)inside.org
http://www.inside.org/~flux/