[ceph-users] Orchestration seems not to work

4 May 2023

Hi,

I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but the 
following problem existed when I was still everywhere on 17.2.5 .

I had a major issue in my cluster which could be solved with a lot of 
your help and even more trial and error. Right now it seems that most is 
already fixed but I can't rule out that there's still some problem 
hidden. The very issue I'm asking about started during the repair.

When I want to orchestrate the cluster, it logs the command but it 
doesn't do anything. No matter if I use ceph dashboard or "ceph orch" in 
"cephadm shell". I don't get any error message when I try to deploy new 
services, redeploy them etc. The log only says "scheduled" and that's 
it. Same when I change placement rules. Usually I use tags. But since 
they don't work anymore, too, I tried host and umanaged. No success. The 
only way I can actually start and stop containers is via systemctl from 
the host itself.

When I run "ceph orch ls" or "ceph orch ps" I see services I deployed

for testing being deleted (for weeks now). Ans especially a lot of old 
MDS are listed as "error" or "starting". The list doesn't match
reality 
at all because I had to start them by hand.

I tried "ceph mgr fail" and even a complete shutdown of the whole 
cluster with all nodes including all mgs, mds even osd - everything 
during a maintenance window. Didn't change anything.

Could you help me? To be honest I'm still rather new to Ceph and since I 
didn't find anything in the logs that caught my eye I would be thankful 
for hints how to debug.

Cheers,
Thomas
-- 
http://www.widhalm.or.at
GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widhalmt(a)widhalm.or.at

2024

2023

2022

2021

2020

2019

[ceph-users] Orchestration seems not to work