+dev(a)ceph.io
On Wed, Jun 19, 2019 at 11:18 AM Rishabh Dave <ridave(a)redhat.com> wrote:
Hi all,
I am working on a ceph-ansible playbook[1] that removes an MDS from an
already deployed Ceph cluster. Going through documentation and
ceph-ansible codebase I found out 3 ways to stop an MDS -
* ceph fail mds fail <mds-name> && rm -rf /var/lib/cephmds/ceph-{id} [2]
* systemctl stop ceph-mds@$HOSTNAME
* ceph tell mds.x exit
How do these 3 ways compare to each other? I ran these commands on
ceph-ansible deployed cluster and all 3 had the very same effect. Is
any one of these better than the rest?
The first one doesn't cause the mds process to exit. I would suggest
the systemd approach as systemd may restart a daemon if it exits
normally (third approach).
What about "ceph mds rm" and "ceph mds
rmfailed"? The first time I was
Those are dev commands not meant for this purpose.
looking for various ways to stop an MDS, I tried
"ceph mds fail
<mds-name> && ceph mds rm <global-id>" and it did not work since
"ceph
mds rm" requires an MDS to inactive[3]. Is there a way to render an
MDS inactive? I couldn't find one.
I also tried "ceph mds fail <mds-name> && ceph mds rmfailed
<mds-rank>" but this did not stop MDS. It only changed MDS's state to
'standby" -
(teuth-venv) $ ./bin/ceph fs dump | grep -A 1 standby_count_wanted 2> /dev/null
dumped fsmap epoch 4
standby_count_wanted 0
4232: [v2:192.168.0.217:6826/2113356090,v1:192.168.0.217:6827/2113356090]
'a' mds.0.3 up:active seq 4
(teuth-venv) $ ./bin/ceph mds fail a 2> /dev/null && ./bin/ceph mds
rmfailed --yes-i-really-mean-it 0 2> /dev/null && ./bin/ceph fs dump |
grep -A 3 Standby 2> /dev/null
dumped fsmap epoch 6
Standby daemons:
4286: [v2:192.168.0.217:6826/401505106,v1:192.168.0.217:6827/401505106]
'a' mds.-1.0 up:standby seq 1
(teuth-venv) $
Also, I find the usage of "remove" in this doc[2] ambiguous -- it can
mean removing MDS from cluster by changing MDS's state to standby or
it can mean killing/stopping it altogether. Reading [2] my impression
was that it meant killing/stopping it but "remove" is also used to
describe "ceph mds rm" and "ceph mds rmfailed" commands. Of these,
at
least "ceph mds rmfailed" does not stop the MDS. If I am not the only
one to find this ambiguous, I'll go ahead and change the docs
accordingly.
[2] is not really useful documentation, unfortunately. The best way to
stop an MDS such that you want to permanently remove the daemon is to
just have the service manager (systemd) stop it. The only
consideration otherwise is whether you have a replacement MDS
available to take-over (if the operator even wants that to happen).
--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D