I'm looking for help in figuring out why cephadm isn't making any progress after I told it to redeploy an mds daemon with:
ceph orch daemon redeploy mds.cephfs.aladdin.kgokhr ceph/ceph:v15.2.12
The output from 'ceph -W cephadm' just says:
2021-05-14T16:24:46.628084+0000 mgr.paris.glbvov [INF] Schedule redeploy daemon mds.cephfs.aladdin.kgokhr
However, the mds never gets redeployed. I do see this warning in 'ceph health detail' which might have something to do with it:
Module 'cephadm' has failed: 'NoneType' object has no attribute 'target_id'
What steps can I do to figure out why cephadm is hung?
Thanks,
Bryan
Hi,
We are seeing the mgr attempt to apply our OSD spec on the various
hosts, then block. When we investigate, we see the mgr has executed
cephadm calls like so, which are blocking:
root 1522444 0.0 0.0 102740 23216 ? S 17:32 0:00
\_ /usr/bin/python3
/var/lib/ceph/XXXXX/cephadm.30cb78bdbbafb384af862e1c2292b944f15942b586128e91262b43e91e11ae90
--image docker.io/ceph/ceph@sha256:694ba9cdcbe6cb7d25ab14b34113c42c2d1af18d4c79c7ba4d1f62cf43d145fe
ceph-volume --fsid XXXXX -- lvm list --format json
This occurs on all hosts in the cluster, following
starting/restarting/failing over a manager. It's blocking an
in-progress upgrade post-manager updates on one cluster, currently.
Looking at the cephadm logs on the host(s) in question, we see the
last entry appears to be truncated, like:
2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.db_uuid": "1n2f5v-EEgO-1Kn6-hQd2-v5QF-AN9o-XPkL6b",
2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.encrypted": "0",
2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.osd_fsid": "XXXX",
2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.osd_id": "205",
2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.osdspec_affinity": "osd_spec",
2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.type": "block",
The previous entry looks like this:
2021-05-10 17:32:06,469 INFO /usr/bin/podman:
"ceph.db_uuid": "TMTPD5-MLqp-06O2-raqp-S8o5-TfRG-hbFmpu",
2021-05-10 17:32:06,469 INFO /usr/bin/podman:
"ceph.encrypted": "0",
2021-05-10 17:32:06,469 INFO /usr/bin/podman:
"ceph.osd_fsid": "XXXX",
2021-05-10 17:32:06,469 INFO /usr/bin/podman:
"ceph.osd_id": "195",
2021-05-10 17:32:06,470 INFO /usr/bin/podman:
"ceph.osdspec_affinity": "osd_spec",
2021-05-10 17:32:06,470 INFO /usr/bin/podman:
"ceph.type": "block",
2021-05-10 17:32:06,470 INFO /usr/bin/podman: "ceph.vdo": "0"
2021-05-10 17:32:06,470 INFO /usr/bin/podman: },
2021-05-10 17:32:06,470 INFO /usr/bin/podman: "type": "block",
2021-05-10 17:32:06,470 INFO /usr/bin/podman: "vg_name":
"ceph-ffd1a4a7-316c-4c85-acde-06459e26f2c4"
2021-05-10 17:32:06,470 INFO /usr/bin/podman: }
2021-05-10 17:32:06,470 INFO /usr/bin/podman: ],
We'd like to get to the bottom of this, please let us know what other
information we can provide.
Thank you,
David
I had a 3 mon CEPH cluster, after updating from 15.2.x to 16.2.x one of my mon's is showing as a stopped state in the Ceph Dashboard.And checking the cephadm logs on the server in question I can see "/usr/bin/docker: Error: No such object: ceph-30449cba-44e4-11eb-ba64-dda10beff041-mon.sn-m01"There is a few OSD services running on the same physical server and they all are starting/running fine via docker.I tried to do a cephadm apply mon to push a new mon to the same host, but it seems to not do anything, nothing shows in the same log file on sn-m01Also ceph -s shows full health and no errors and has no trace of the "failed" mon (not sure if this is expected), only in the ceph dashboard under services can I see the stopped not running mon.
Sent via MXlogin
Hello,
I just noticed on my small Octopus cluster that the ceph-mgr on a mgr/mon node uses 3.6GB of resident memory (RES) as you can see below from the top output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2704 167 20 0 5030528 3.6g 35796 S 6.6 47.2 23:08.18 ceph-mgr
2699 167 20 0 1291504 884796 23672 S 4.6 11.1 13:23.63 ceph-mon
Is there a way to limit the memory usage of ceph-mgr just like one can do with ceph OSD (osd_memory_target)?
I tried something like mgr_memory_target but that parameter does not exist.
Thanks,
Mabi
Hello,
I need to re-install one node of my Octopus cluster (installed with cephadm) which is a mon/mgr node and did not find in the documentation how to do that with the new ceph orchestrator commands.
So my question would be what are the "ceph orch" commands I need to run in order to "out" nicely the mgr and mon services from that specific node?
I have a standby manager and 3 mons in total so from the redundancy it should be no problem to take that one node out for re-installing it.
Best regards,
Mabi
Hi,
One of the recent changes that Ceph Pacific has introduced is the
removal of support for /etc/hosts on the ceph cluster nodes, that use
*podman* as the container engine (CentOS8+)
This means that name resolution from within the ceph containers now
relies on either DNS, or the host to ip mapping that is created when
adding a host to the cluster with the orchestrator CLI (orch host
add).
The exclusion of /etc/hosts has been implemented using a --no-hosts
setting on the "podman run" command.
Installations that use docker are unaffected.
So if you're planning to use Ceph Pacific with podman *and* need
/etc/hosts to work, it would be great to hear from you!
Cheers,
Paul Cuzner
Hi ceph-users,
I deploy ceph on arm64 by cephadm.
Mon/mgr seems work well.
But for osd, it can not work well, that means I can add osd to my cluster but osd can not goto up&in status.
I can not find helpful info from ceph log.
Can anyone help me, or how can I find the reason why osd can not goto up/in status.
Before add osd, raw disk is available.
Add osd by "ceph orch apply osd --all-available-devices”
Hardware: NXP LS1043A processor,64bit
OS: Ubuntu 18.04.5 LTS (GNU/Linux 4.19.26 aarch64)
Ceph:15.2.9/15.2.11/16.2.3