I am still struggling with this cephadm issue, does anyone have an idea?
I double checked and python3 is available on all nodes:
$ which python3
/usr/bin/python3
$ python3 --version
Python 3.8.10
How can I fix that?
and how is it possible that rebooting my nodes breaks the cephadm orchestrator?
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, July 6th, 2021 at 8:09 AM, mabi <mabi(a)protonmail.ch> wrote:
> Hello,
>
> After having done a rolling reboot of my Octopus 15.2.13 cluster of 8 nodes cephadm
does not find python3 on the node and hence I get quite a few of the following warnings:
>
> [WRN] CEPHADM_HOST_CHECK_FAILED: 7 hosts fail cephadm check
>
> host ceph1f failed check: Can't communicate with remote host `ceph1f`, possibly
because python3 is not installed there: [Errno 32] Broken pipe
>
> Here is the full stack trace from cephadm:
>
> 2021-07-06T06:03:20.798410+0000 mgr.ceph1a.xxqpph [ERR] Failed to apply
osd.all-available-devices spec
DriveGroupSpec(name=all-available-devices->placement=PlacementSpec(host_pattern='*'),
service_id='all-available-devices', service_type='osd',
data_devices=DeviceSelection(all=True), osd_id_claims={}, unmanaged=False,
filter_logic='AND', preview_only=False): Can't communicate with remote host
`ceph1d`, possibly because python3 is not installed there: [Errno 32] Broken pipe
>
> Traceback (most recent call last):
>
> File "/usr/share/ceph/mgr/cephadm/module.py", line 1015, in
_remote_connection
>
> conn, connr = self._get_connection(addr)
>
> File "/usr/share/ceph/mgr/cephadm/module.py", line 978, in _get_connection
>
> sudo=True if self.ssh_user != 'root' else False)
>
> File "/lib/python3.6/site-packages/remoto/backends/init.py", line 34, in
init
>
> self.gateway = self._make_gateway(hostname)
>
> File "/lib/python3.6/site-packages/remoto/backends/init.py", line 44, in
_make_gateway
>
> self._make_connection_string(hostname)
>
> File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in
makegateway
>
> gw = gateway_bootstrap.bootstrap(io, spec)
>
> File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102,
in bootstrap
>
> bootstrap_exec(io, spec)
>
> File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 46,
in bootstrap_exec
>
> "serve(io, id='%s-slave')" % spec.id,
>
> File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 78,
in sendexec
>
> io.write((repr(source) + "\n").encode("ascii"))
>
> File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 409, in
write
>
> self._write(data)
>
> BrokenPipeError: [Errno 32] Broken pipe
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>
> File "/usr/share/ceph/mgr/cephadm/module.py", line 1019, in
_remote_connection
>
> raise execnet.gateway_bootstrap.HostNotFound(msg)
>
> execnet.gateway_bootstrap.HostNotFound: Can't communicate with remote host
`ceph1d`, possibly because python3 is not installed there: [Errno 32] Broken pipe
>
> The above exception was the direct cause of the following exception:
>
> Traceback (most recent call last):
>
> File "/usr/share/ceph/mgr/cephadm/serve.py", line 412, in
_apply_all_services
>
> if self._apply_service(spec):
>
> File "/usr/share/ceph/mgr/cephadm/serve.py", line 450, in _apply_service
>
> self.mgr.osd_service.create_from_spec(cast(DriveGroupSpec, spec))
>
> File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 51, in
create_from_spec
>
> ret = create_from_spec_one(self.prepare_drivegroup(drive_group))
>
> File "/usr/share/ceph/mgr/cephadm/utils.py", line 65, in
forall_hosts_wrapper
>
> return CephadmOrchestrator.instance._worker_pool.map(do_work, vals)
>
> File "/lib64/python3.6/multiprocessing/pool.py", line 266, in map
>
> return self._map_async(func, iterable, mapstar, chunksize).get()
>
> File "/lib64/python3.6/multiprocessing/pool.py", line 644, in get
>
> raise self._value
>
> File "/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
>
> result = (True, func(*args, **kwds))
>
> File "/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar
>
> return list(map(*args))
>
> File "/usr/share/ceph/mgr/cephadm/utils.py", line 59, in do_work
>
> return f(*arg)
>
> File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 47, in
create_from_spec_one
>
> host, cmd, replace_osd_ids=osd_id_claims.get(host, []), env_vars=env_vars
>
> File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 56, in
create_single_host
>
> out, err, code = self._run_ceph_volume_command(host, cmd, env_vars=env_vars)
>
> File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 271, in
_run_ceph_volume_command
>
> error_ok=True)
>
> File "/usr/share/ceph/mgr/cephadm/module.py", line 1100, in _run_cephadm
>
> with self._remote_connection(host, addr) as tpl:
>
> File "/lib64/python3.6/contextlib.py", line 81, in enter
>
> return next(self.gen)
>
> File "/usr/share/ceph/mgr/cephadm/module.py", line 1046, in
_remote_connection
>
> raise OrchestratorError(msg) from e
>
> orchestrator._interface.OrchestratorError: Can't communicate with remote host
`ceph1d`, possibly because python3 is not installed there: [Errno 32] Broken pipe
>
> I checked directly on the nodes and I can execute "python3" command and I
can also SSH into all nodes with the following test command:
>
> ssh -F ssh_config -i ~/cephadm_private_key root@nodeX
>
> So I don't really understand what could have broken the cephadm orchestrator...
Any ideas? The cephfs itself is still working.
>
> Best regards,
>
> Mabi