Hello folks, this past summer Shraddha Agrawal implemented a new way for
teuthology to run tests - a single process, teuthloogy-dispatcher,
locking and then running jobs, rather than a bunch of workers competing
for locks [0].
Since there's a single dispatcher for each queue, jobs are run in strict
priority order. This also enables a couple improvements to the test
experience:
1) jobs may require more nodes - since only one job is locking at a
time, they cannot be starved of available nodes
2) dead jobs will have full logs - jobs that hit the max_job_time (12
hours in sepia) will have full ceph logs and coredumps collected as
usual - this should help quite a bit with stabilizing pacific
For more details, check out the PR [1].
This is now running all the queues in the sepia lab - let us know if
you run into any bugs!
And thanks to Shraddha for her hard work on this!
Josh
[0]
https://ceph.io/gsoc-2020/#teuthology-scheduling%20Improvements
[1]
https://github.com/ceph/teuthology/pull/1546