[ceph-users] Re: ceph orch status hangs forever

20 May 2021

Hi,

Here it is:
# cephadm shell -- ceph status
Using recent ceph image 172.16.3.146:4000/ceph/ceph:v15.2.9
  cluster:
    id:     3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c
    health: HEALTH_WARN
            2 failed cephadm daemon(s)

  services:
    mon: 3 daemons, quorum spsrc-mon-1,spsrc-mon-2,spsrc-mon-3 (age 7d)
    mgr: spsrc-mon-1.eziiam(active, since 7d), standbys:
spsrc-mon-2.ilbncj, spsrc-mon-3.vzwxfr
    mds: manila:1 {0=manila.spsrc-mon-2.syveaq=up:active} 2 up:standby
    osd: 248 osds: 248 up (since 2w), 248 in (since 3M)

  data:
    pools:   6 pools, 257 pgs
    objects: 4.77M objects, 5.9 TiB
    usage:   12 TiB used, 1.3 PiB / 1.3 PiB avail
    pgs:     257 active+clean

Also:
# cephadm shell -- ceph health detail
Using recent ceph image 172.16.3.146:4000/ceph/ceph:v15.2.9
HEALTH_WARN 2 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    daemon mon.spsrc-mon-1-safe on spsrc-mon-1 is in error state
    daemon mon.spsrc-mon-2-safe on spsrc-mon-2 is in error state

I don't think these containers are crucial, right? I did ask a while ago:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/MQM46KBC3BN…

All 3 Ceph monitor nodes report that "systemctl status ceph\*.service" are
ok.

Here are the commands I tried to inspect the logs:
grep -i health -r /var/log/ceph/
grep -i error -r /var/log/ceph/

I get:
ceph_volume.exceptions.ConfigurationError: Unable to load expected Ceph
config at: /etc/ceph/ceph.conf

But I think that's expected in a containerised deployment?

Do you suggest other commands?

Many thanks,
Sebastian

On Wed, 19 May 2021 at 21:49, Eugen Block &lt;eblock(a)nde.ag&gt; wrote:

...
  Hi,

 can you paste the ceph status?
 The orchestrator is a MGR module, have you checked if the containers
 are up and running (assuming it’s cephadm based)? Do the logs also
 report the cluster as healthy?

 Zitat von Sebastian Luna Valero &lt;sebastian.luna.valero(a)gmail.com&gt;om>:

  Hi,

 After an unschedule power outage our Ceph (Octopus) cluster reports a
 healthy state with: "ceph status". However, when we run "ceph orch 
status"
  the command hangs forever.

 Are there other commands that we can run for a more thorough health check
 of the cluster?

 After looking at:
 https://docs.ceph.com/en/octopus/rados/operations/health-checks/

 I also run "ceph crash ls-new" but it hangs forever as well.

 Any ideas?

 Our Ceph cluster is currently used as backend storage for our OpenStack
 cluster, and we are also having issues with storage volumes attached to
 VMs, but we don't know how to narrow down the root cause.

 Any feedback is highly appreciated.

 Best regards,
 Sebastian
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: ceph orch status hangs forever