[ceph-users] Re: Orchestration seems not to work

4 May 2023

Thanks for the reply.

"Refreshed" is "3 weeks ago" on most lines. The running mds and 
osd.cost_capacity are both "-" in this column.

I'm already done with "mgr fail", that didn't do anything. And I even 
tried a complete shut down during a maintenance windows that was not 3 
weeks ago but last week.

So this doesn't seem to help. Thanks anyway. The only thing could be 
that the command was started by a systemd service again. But I can't 
imagine that.

On 04.05.23 15:05, Adam King wrote:
...
  First thing I always check when it seems like
orchestrator commands 
 aren't doing anything is "ceph orch ps" and "ceph orch device ls"
and 
 check the REFRESHED column. If it's well above 10 minutes for orch ps or 
 30 minutes for orch device ls, then it means the orchestrator is most 
 likely hanging on some command to refresh the host information. If 
 that's the case, you can follow up with a "ceph mgr fail", wait a few 
 minutes and check the orch ps and device ls REFRESHED column again. If 
 only certain hosts are not having their daemon/device information 
 refreshed, you can go to the hosts that aren't having their info 
 refreshed and check for hanging "cephadm" commands (I just check for "ps

 aux | grep cephadm").

 On Thu, May 4, 2023 at 8:38 AM Thomas Widhalm &lt;widhalmt(a)widhalm.or.at 
 <mailto:widhalmt@widhalm.or.at>> wrote:

     Hi,

     I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but
     the
     following problem existed when I was still everywhere on 17.2.5 .

     I had a major issue in my cluster which could be solved with a lot of
     your help and even more trial and error. Right now it seems that
     most is
     already fixed but I can't rule out that there's still some problem
     hidden. The very issue I'm asking about started during the repair.

     When I want to orchestrate the cluster, it logs the command but it
     doesn't do anything. No matter if I use ceph dashboard or "ceph
     orch" in
     "cephadm shell". I don't get any error message when I try to deploy
new
     services, redeploy them etc. The log only says "scheduled" and that's
     it. Same when I change placement rules. Usually I use tags. But since
     they don't work anymore, too, I tried host and umanaged. No success.
     The
     only way I can actually start and stop containers is via systemctl from
     the host itself.

     When I run "ceph orch ls" or "ceph orch ps" I see services I
deployed
     for testing being deleted (for weeks now). Ans especially a lot of old
     MDS are listed as "error" or "starting". The list doesn't
match reality
     at all because I had to start them by hand.

     I tried "ceph mgr fail" and even a complete shutdown of the whole
     cluster with all nodes including all mgs, mds even osd - everything
     during a maintenance window. Didn't change anything.

     Could you help me? To be honest I'm still rather new to Ceph and
     since I
     didn't find anything in the logs that caught my eye I would be thankful
     for hints how to debug.

     Cheers,
     Thomas
     -- 
     http://www.widhalm.or.at <http://www.widhalm.or.at>
     GnuPG : 6265BAE6 , A84CB603
     Threema: H7AV7D33
     Telegram, Signal: widhalmt(a)widhalm.or.at <mailto:widhalmt@widhalm.or.at>
     _______________________________________________
     ceph-users mailing list -- ceph-users(a)ceph.io
     <mailto:ceph-users@ceph.io>
     To unsubscribe send an email to ceph-users-leave(a)ceph.io
     <mailto:ceph-users-leave@ceph.io>

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Orchestration seems not to work