CI in cpeh - Dev

18 Mar 2020

Dear Community
Apologize for the lengthy email, but wanted to share some thoughts I have
regarding our upstream test process (after being in the project for ~18
month).

TL;DR: execute teuthology tests directly from jenkins

Currently, during the PR submission process jenkins build ceph for one
target (ubuntu18.04) and then run the unit tests suite.
Due to the nature of ceph, unit tests are covering only part of the code,
and therefore, the author is expected to run the changes against teuthology
as well.

Current process for running teuthology has has several drawbacks:
- you have to submit you changes as a branch into the ceph-ci repo, this
kicks multiple builds for multiple targets which usually takes a couple of
hours (1-4) to finish. testing can only start after the images are ready
- even though the code in the PR is merged to the latest version of the
branch you are merging to, the images tested in teuthology are using your
branch, and whether it was rebased recently or not is up to the author
- you have to execute a teuthology test suite against these images, the
suites are pretty big, and run against several targets (very similar to our
nightlies). this means that the entire process takes hours-days to finish
- there is no guarantee that the code being tested in teuthology is the
same code being merged in the PR
- the manual analysis of the teuthology results is time consuming and error
prone, as you have to figure out which tests didn't run due to
infrastructure issues, which fails are expected and in the process of
fixing etc.

As a result, developers, instead of treating teuthology as a valuable
verification tools, try to avoid it :-(
Often, running teuthology and analyzing the results, is done by a handful
of experienced developers.

Suggestion:
In many projects, system and integration testing are run from jenkins
automatically - would argue that this could be the case in our project as
well.
- select one target, similar to what we did with unit testing. the target
is automatically built in jenkins
- create smaller "sanity" suits in teuthology that would allow for faster
execution, and would test only the select target
- jenkins can use teuthology and the sepia lab, so no changes would be
needed for actually running the tests
- to avoid extensive load on the sepia lab, the actual triggering could be
manual (e.g. "jenkins test rgw sanity"). This means that the author can
select which area of the code to test, and would execute it when the PR is
in good enough shape for testing
- the results could be automatically analyzed against infrastructure
issues, tests issues etc. the knowledge needed for this analysis could be
coded and updated in jenkins
- in some cases, manual execution of specific tests or against specific
targets will be needed, and this could be done the same way it is currently
done

in the future...
- we may also automate the process of selecting which tests needs to run,
according to the files being modified in the PR
- gating PRs on passing teuthology runs would serve as a tool for better
quality code. but also for more stable tests and infrastructure

Appreciate your feedback!

Yuval