Hi cephers,

At the CLT meeting today there's been agreement to make Ceph API tests "required" again for Pull Request to be merged:
  • The current approach ("honoring the agreement not to merge failing PRs") is simply not working: PRs have been merged with API tests in red. While most of these are harmless due to random failures (we are working to improve this), other times API tests warned about real issues... which eventually slipped into the code. [1] [2] [3]
  • The cost & risk of debugging issues a posteriori is usually higher than the pain of retriggering the API tests (we are working to improve this).
  • Ceph API tests, even with their downsides, are providing true integration testing at CI time: this doesn't simply mean complex unit tests or component testing, it means running a vstart Ceph cluster and actually testing RADOS, RBD, RGW, CephFS...

What does this mean?

If Ceph API tests are in green, great! It's not that hard to achieve: ~75% PRs pass the Ceph API tests from the beginning.

image.png

What if they are NOT passing?

image.png

From Github you may access Ceph API tests results in Jenkins by clicking in "Details" and you'll see a report:
1. The test may fail due to multiple causes: issues in a Jenkins node, Github repo fetching, "make" stage, ... (if this is the case you may easily retrigger the Ceph API test by adding a comment to the PR with the text "jenkins test api").
2. If the failure actually happens as a result of the Ceph API tests themselves, the report will look like this:
image.png
From there:
  • You can quickly check whether this has already been reported (a known issue or a flapping test) or otherwise raise a new issue report.
  • If the failure looks like a flapping one, you may retrigger the tests.
  • If, however, the failure is caused by an intentional change in behaviour, please reach out to Dashboard team for help.
What may you expect from the Dashboard team?
  • We are working to harden Ceph API tests, increase their coverage and make them more stable. You may check our backlog of improvements. You are welcome to contribute with ideas or, even better, working code ;-)
  • We are monitoring every day how Ceph API tests are doing: failure rate, runtime, ...
  • You can find us in #IRC (#ceph-dashboard), Github (@ceph/dashboard), in this very mail-list or pinging us directly: Lenz (in CC) is the component lead, Laura (in CC too) is taking care of Dashboard QA, or myself.
Kind regards,

Ernesto