Hi,
Here it is:
# cephadm shell -- ceph status
Using recent ceph image 172.16.3.146:4000/ceph/ceph:v15.2.9
cluster:
id: 3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c
health: HEALTH_WARN
2 failed cephadm daemon(s)
services:
mon: 3 daemons, quorum spsrc-mon-1,spsrc-mon-2,spsrc-mon-3 (age 7d)
mgr: spsrc-mon-1.eziiam(active, since 7d), standbys:
spsrc-mon-2.ilbncj, spsrc-mon-3.vzwxfr
mds: manila:1 {0=manila.spsrc-mon-2.syveaq=up:active} 2 up:standby
osd: 248 osds: 248 up (since 2w), 248 in (since 3M)
data:
pools: 6 pools, 257 pgs
objects: 4.77M objects, 5.9 TiB
usage: 12 TiB used, 1.3 PiB / 1.3 PiB avail
pgs: 257 active+clean
Also:
# cephadm shell -- ceph health detail
Using recent ceph image 172.16.3.146:4000/ceph/ceph:v15.2.9
HEALTH_WARN 2 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
daemon mon.spsrc-mon-1-safe on spsrc-mon-1 is in error state
daemon mon.spsrc-mon-2-safe on spsrc-mon-2 is in error state
I don't think these containers are crucial, right? I did ask a while ago:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/MQM46KBC3BN…
All 3 Ceph monitor nodes report that "systemctl status ceph\*.service" are
ok.
Here are the commands I tried to inspect the logs:
grep -i health -r /var/log/ceph/
grep -i error -r /var/log/ceph/
I get:
ceph_volume.exceptions.ConfigurationError: Unable to load expected Ceph
config at: /etc/ceph/ceph.conf
But I think that's expected in a containerised deployment?
Do you suggest other commands?
Many thanks,
Sebastian
On Wed, 19 May 2021 at 21:49, Eugen Block <eblock(a)nde.ag> wrote:
Hi,
can you paste the ceph status?
The orchestrator is a MGR module, have you checked if the containers
are up and running (assuming it’s cephadm based)? Do the logs also
report the cluster as healthy?
Zitat von Sebastian Luna Valero <sebastian.luna.valero(a)gmail.com>om>:
Hi,
After an unschedule power outage our Ceph (Octopus) cluster reports a
healthy state with: "ceph status". However, when we run "ceph orch
status"
the command hangs forever.
Are there other commands that we can run for a more thorough health check
of the cluster?
After looking at:
https://docs.ceph.com/en/octopus/rados/operations/health-checks/
I also run "ceph crash ls-new" but it hangs forever as well.
Any ideas?
Our Ceph cluster is currently used as backend storage for our OpenStack
cluster, and we are also having issues with storage volumes attached to
VMs, but we don't know how to narrow down the root cause.
Any feedback is highly appreciated.
Best regards,
Sebastian
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io