[ceph-users] Re: Orchestration seems not to work

4 May 2023

First thing I always check when it seems like orchestrator commands aren't
doing anything is "ceph orch ps" and "ceph orch device ls" and check
the
REFRESHED column. If it's well above 10 minutes for orch ps or 30 minutes
for orch device ls, then it means the orchestrator is most likely hanging
on some command to refresh the host information. If that's the case, you
can follow up with a "ceph mgr fail", wait a few minutes and check the orch
ps and device ls REFRESHED column again. If only certain hosts are not
having their daemon/device information refreshed, you can go to the hosts
that aren't having their info refreshed and check for hanging "cephadm"
commands (I just check for "ps aux | grep cephadm").

On Thu, May 4, 2023 at 8:38 AM Thomas Widhalm &lt;widhalmt(a)widhalm.or.at&gt;
wrote:

...
  Hi,

 I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but the
 following problem existed when I was still everywhere on 17.2.5 .

 I had a major issue in my cluster which could be solved with a lot of
 your help and even more trial and error. Right now it seems that most is
 already fixed but I can't rule out that there's still some problem
 hidden. The very issue I'm asking about started during the repair.

 When I want to orchestrate the cluster, it logs the command but it
 doesn't do anything. No matter if I use ceph dashboard or "ceph orch" in
 "cephadm shell". I don't get any error message when I try to deploy new
 services, redeploy them etc. The log only says "scheduled" and that's
 it. Same when I change placement rules. Usually I use tags. But since
 they don't work anymore, too, I tried host and umanaged. No success. The
 only way I can actually start and stop containers is via systemctl from
 the host itself.

 When I run "ceph orch ls" or "ceph orch ps" I see services I
deployed
 for testing being deleted (for weeks now). Ans especially a lot of old
 MDS are listed as "error" or "starting". The list doesn't match
reality
 at all because I had to start them by hand.

 I tried "ceph mgr fail" and even a complete shutdown of the whole
 cluster with all nodes including all mgs, mds even osd - everything
 during a maintenance window. Didn't change anything.

 Could you help me? To be honest I'm still rather new to Ceph and since I
 didn't find anything in the logs that caught my eye I would be thankful
 for hints how to debug.

 Cheers,
 Thomas
 --
 http://www.widhalm.or.at
 GnuPG : 6265BAE6 , A84CB603
 Threema: H7AV7D33
 Telegram, Signal: widhalmt(a)widhalm.or.at
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Orchestration seems not to work