Dear Community
Apologize for the lengthy email, but wanted to share some thoughts I have regarding our upstream test process (after being in the project for ~18 month).

TL;DR: execute teuthology tests directly from jenkins

Currently, during the PR submission process jenkins build ceph for one target (ubuntu18.04) and then run the unit tests suite.
Due to the nature of ceph, unit tests are covering only part of the code, and therefore, the author is expected to run the changes against teuthology as well.

Current process for running teuthology has has several drawbacks:
- you have to submit you changes as a branch into the ceph-ci repo, this kicks multiple builds for multiple targets which usually takes a couple of hours (1-4) to finish. testing can only start after the images are ready
- even though the code in the PR is merged to the latest version of the branch you are merging to, the images tested in teuthology are using your branch, and whether it was rebased recently or not is up to the author
- you have to execute a teuthology test suite against these images, the suites are pretty big, and run against several targets (very similar to our nightlies). this means that the entire process takes hours-days to finish
- there is no guarantee that the code being tested in teuthology is the same code being merged in the PR
- the manual analysis of the teuthology results is time consuming and error prone, as you have to figure out which tests didn't run due to infrastructure issues, which fails are expected and in the process of fixing etc.

As a result, developers, instead of treating teuthology as a valuable verification tools, try to avoid it :-(
Often, running teuthology and analyzing the results, is done by a handful of experienced developers.

Suggestion:
In many projects, system and integration testing are run from jenkins automatically - would argue that this could be the case in our project as well.
- select one target, similar to what we did with unit testing. the target is automatically built in jenkins
- create smaller "sanity" suits in teuthology that would allow for faster execution, and would test only the select target
- jenkins can use teuthology and the sepia lab, so no changes would be needed for actually running the tests
- to avoid extensive load on the sepia lab, the actual triggering could be manual (e.g. "jenkins test rgw sanity"). This means that the author can select which area of the code to test, and would execute it when the PR is in good enough shape for testing
- the results could be automatically analyzed against infrastructure issues, tests issues etc. the knowledge needed for this analysis could be coded and updated in jenkins
- in some cases, manual execution of specific tests or against specific targets will be needed, and this could be done the same way it is currently done

in the future...
- we may also automate the process of selecting which tests needs to run, according to the files being modified in the PR
- gating PRs on passing teuthology runs would serve as a tool for better quality code. but also for more stable tests and infrastructure

Appreciate your feedback!

Yuval