Issue with cephadm not finding python3 after reboot - ceph-users

6 Jul 2021

Hello,

After having done a rolling reboot of my Octopus 15.2.13 cluster of 8 nodes cephadm does
not find python3 on the node and hence I get quite a few of the following warnings:

[WRN] CEPHADM_HOST_CHECK_FAILED: 7 hosts fail cephadm check
    host ceph1f failed check: Can't communicate with remote host `ceph1f`, possibly
because python3 is not installed there: [Errno 32] Broken pipe

Here is the full stack trace from cephadm:

2021-07-06T06:03:20.798410+0000 mgr.ceph1a.xxqpph [ERR] Failed to apply
osd.all-available-devices spec
DriveGroupSpec(name=all-available-devices->placement=PlacementSpec(host_pattern='*'),
service_id='all-available-devices', service_type='osd',
data_devices=DeviceSelection(all=True), osd_id_claims={}, unmanaged=False,
filter_logic='AND', preview_only=False): Can't communicate with remote host
`ceph1d`, possibly because python3 is not installed there: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1015, in
_remote_connection
    conn, connr = self._get_connection(addr)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 978, in _get_connection
    sudo=True if self.ssh_user != 'root' else False)
  File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in
__init__
    self.gateway = self._make_gateway(hostname)
  File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 44, in
_make_gateway
    self._make_connection_string(hostname)
  File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in
makegateway
    gw = gateway_bootstrap.bootstrap(io, spec)
  File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102, in
bootstrap
    bootstrap_exec(io, spec)
  File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 46, in
bootstrap_exec
    "serve(io, id='%s-slave')" % spec.id,
  File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 78, in
sendexec
    io.write((repr(source) + "\n").encode("ascii"))
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 409, in
write
    self._write(data)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1019, in
_remote_connection
    raise execnet.gateway_bootstrap.HostNotFound(msg)
execnet.gateway_bootstrap.HostNotFound: Can't communicate with remote host `ceph1d`,
possibly because python3 is not installed there: [Errno 32] Broken pipe

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 412, in _apply_all_services
    if self._apply_service(spec):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 450, in _apply_service
    self.mgr.osd_service.create_from_spec(cast(DriveGroupSpec, spec))
  File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 51, in
create_from_spec
    ret = create_from_spec_one(self.prepare_drivegroup(drive_group))
  File "/usr/share/ceph/mgr/cephadm/utils.py", line 65, in forall_hosts_wrapper
    return CephadmOrchestrator.instance._worker_pool.map(do_work, vals)
  File "/lib64/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/lib64/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/usr/share/ceph/mgr/cephadm/utils.py", line 59, in do_work
    return f(*arg)
  File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 47, in
create_from_spec_one
    host, cmd, replace_osd_ids=osd_id_claims.get(host, []), env_vars=env_vars
  File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 56, in
create_single_host
    out, err, code = self._run_ceph_volume_command(host, cmd, env_vars=env_vars)
  File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 271, in
_run_ceph_volume_command
    error_ok=True)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1100, in _run_cephadm
    with self._remote_connection(host, addr) as tpl:
  File "/lib64/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1046, in
_remote_connection
    raise OrchestratorError(msg) from e
orchestrator._interface.OrchestratorError: Can't communicate with remote host
`ceph1d`, possibly because python3 is not installed there: [Errno 32] Broken pipe

I checked directly on the nodes and I can execute "python3" command and I can
also SSH into all nodes with the following test command:

ssh -F ssh_config -i ~/cephadm_private_key root@nodeX

So I don't really understand what could have broken the cephadm orchestrator... Any
ideas? The cephfs itself is still working.

Best regards,
Mabi