For the benefit of our new folks and for posterity:
Many of our QA tests for CephFS are located in qa/tasks/cephfs/*.
These get run in teuthology with various cluster configurations. What
everyone will need to be able to do is develop these tests locally
without waiting for teuthology so you can rapidly find errors in your
test cases and development builds.
To do this, you need to use the qa/tasks/vstart_runner.py script. This
allows you to use a vstart cluster to execute your tests by providing
the necessary frameworks the tests expect.
On a development box*, build ceph. If you're just testing CephFS, you
can usually get away with a smaller build without rbd/rgw:
/do_cmake.sh -DWITH_PYTHON3:BOOL=ON -DWITH_BABELTRACE=OFF
-DWITH_MANPAGE=OFF -DWITH_RBD=OFF -DWITH_RADOSGW=OFF && time (cd build
&& make -j24 CMAKE_BUILD_TYPE=Debug -k)
Next, build teuthology:
git clone https://github.com/ceph/teuthology.git && cd teuthology &&
virtualenv ./venv && source venv/bin/activate && pip install --upgrade
pip && pip install -r requirements.txt && python setup.py develop
Next, start a vstart cluster:
cd ceph/build && env MDS=3 ../src/vstart.sh -d -b -l -n --without-dashboard
Finally, run vstart_runner:
python2 ../qa/tasks/vstart_runner.py --interactive
tasks.cephfs.test_snapshots.TestSnapshots
^ That's an example test. The format is based on the directory
structure of qa/tasks/cephfs/test_snapshots. The final part is the
class we're testing, TestSnapshots. This invocation of
vstart_runner.py will run every test in TestSnapshots, methods
beginning with "test_". If you want to run a specific test, then we
could do:
python2 ../qa/tasks/vstart_runner.py --interactive
tasks.cephfs.test_snapshots.TestSnapshots.test_snapclient_cache
Please give the above a try sometime soon so you know how to do it and
we can resolve any problems. This is an important skill to have for
developing CephFS.
* Hopefully you're using one of the beefy development boxes that make
compiling Ceph fast. I recommend one of the senta boxes like
senta03.front.sepia.ceph.com.
--
Patrick Donnelly
Hi all,
I am working on a ceph-ansible playbook[1] that removes an MDS from an
already deployed Ceph cluster. Going through documentation and
ceph-ansible codebase I found out 3 ways to stop an MDS -
* ceph fail mds fail <mds-name> && rm -rf /var/lib/cephmds/ceph-{id} [2]
* systemctl stop ceph-mds@$HOSTNAME
* ceph tell mds.x exit
How do these 3 ways compare to each other? I ran these commands on
ceph-ansible deployed cluster and all 3 had the very same effect. Is
any one of these better than the rest?
What about "ceph mds rm" and "ceph mds rmfailed"? The first time I was
looking for various ways to stop an MDS, I tried "ceph mds fail
<mds-name> && ceph mds rm <global-id>" and it did not work since "ceph
mds rm" requires an MDS to inactive[3]. Is there a way to render an
MDS inactive? I couldn't find one.
I also tried "ceph mds fail <mds-name> && ceph mds rmfailed
<mds-rank>" but this did not stop MDS. It only changed MDS's state to
'standby" -
(teuth-venv) $ ./bin/ceph fs dump | grep -A 1 standby_count_wanted 2> /dev/null
dumped fsmap epoch 4
standby_count_wanted 0
4232: [v2:192.168.0.217:6826/2113356090,v1:192.168.0.217:6827/2113356090]
'a' mds.0.3 up:active seq 4
(teuth-venv) $ ./bin/ceph mds fail a 2> /dev/null && ./bin/ceph mds
rmfailed --yes-i-really-mean-it 0 2> /dev/null && ./bin/ceph fs dump |
grep -A 3 Standby 2> /dev/null
dumped fsmap epoch 6
Standby daemons:
4286: [v2:192.168.0.217:6826/401505106,v1:192.168.0.217:6827/401505106]
'a' mds.-1.0 up:standby seq 1
(teuth-venv) $
Also, I find the usage of "remove" in this doc[2] ambiguous -- it can
mean removing MDS from cluster by changing MDS's state to standby or
it can mean killing/stopping it altogether. Reading [2] my impression
was that it meant killing/stopping it but "remove" is also used to
describe "ceph mds rm" and "ceph mds rmfailed" commands. Of these, at
least "ceph mds rmfailed" does not stop the MDS. If I am not the only
one to find this ambiguous, I'll go ahead and change the docs
accordingly.
- Rishabh
[1] https://github.com/ceph/ceph-ansible/pull/4083
[2] http://docs.ceph.com/docs/master/cephfs/add-remove-mds/
[3] http://docs.ceph.com/docs/master/man/8/ceph/
Hi all,
AFAIK, running tests with vstart_runner.py makes it mandatory that CWD
should be <ceph-repo-root>/build. But, apparently,
test_cephfs_shell.py[1] attempts to issue CephFS shell commands
directly from CWD[1], which is impossible IMO. Is this a bug or am I
missing something? Am I supposed to configure my environment before
running the tests fom test_cephfs_shell.py?
I tried running a couple of tests from test_cephfs_shell.py in the
same we try to run a test from any other suite locally but that didn't
work. The command I used is -
$ python2 ../qa/tasks/vstart_runner.py --interactive --create
tasks.cephfs.test_cephfs_shell.TestCephFSShell.test_mkdir
Following is the traceback for the command above -
File "/home/rishabh/repos/ceph/review/qa/tasks/cephfs/test_cephfs_shell.py",
line 45, in test_mkdir
o = self._cephfs_shell("mkdir d1")
File "/home/rishabh/repos/ceph/review/qa/tasks/cephfs/test_cephfs_shell.py",
line 29, in _cephfs_shell
stdin=stdin)
File "../qa/tasks/vstart_runner.py", line 324, in run
env=env)
File "/usr/lib64/python2.7/subprocess.py", line 394, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
If I am not missing anything, this surely a bug.
[1] https://github.com/ceph/ceph/blob/master/qa/tasks/cephfs/test_cephfs_shell.…