title: CI usability issues
Here, we collect CI usability issues.
This effort started with tails#16959, in order to have a better understanding of how our current setup feels to its users.
This data will help us define our strategy for the future of our CI (e.g. switching to GitLab CI, switching to Jenkins pipelines, or merely removing some UX stumbling blocks without changing the big picture of our setup).
- Cumbersome navigation
- Misleading output
- Missing information
- Suboptimal jobs prioritizing
- Very long feedback loop
- Robustness problems
Finding the Jenkins jobs corresponding to a given branch or MR is cumbersome, because:
The Revision: $COMMIT information, that's displayed on the page of a
test_Tails_ISO_*job run, may be incorrect. In order to determine what commit the test suite was run from, look for
git reset --hardin the console output of the job run.
The success/failure of the
keep_node_busy_during_cleanupjob does not matter.
- When a Jenkins build aborts due to a timeout, no summary of the scenarios that did run is generated. (tails#17678)
Suboptimal jobs prioritizing
- The fact all jobs are treated equally causes trouble during our release process: the RM sometimes has to wait a long time for the builds they care about to run, while our CI resources are kept busy by other builds that arguably could wait a bit longer. (tails/sysadmin/-/issues/9760)
Very long feedback loop
As of August 2020, a full CI pipeline takes almost 7 hours to run.
Here we focus on feelings and human perception.
Several CI jobs are on the critical path of our release process, which forces the RM to wait. In a situation that can already be stressful, this can be tough on their patience.
Some developers have adapted their workflow around this constraint. They feel that making the CI loop twice shorter would not make a significant difference to them: they would anyway come back to it the next day.
Another developer, who routinely use a replica of our CI setup that's twice faster, instead feels that this allows them to iterate faster, complete work on a given task earlier, and limit context-switching. With a 3.5 hours feedback loop, it becomes possible in a single day to:
- do some work
- wait for CI results
- fix bugs found by this first iteration
- send the second iteration to CI
- check the CI results for the second iteration the next day
Note: this is not about robustness problems inherent to our test suite, that don't depend much on how we run the test suite.
- When GitLab is down or connectivity between lizard and GitLab is poor, many Jenkins jobs — if not all — fail. (tails/sysadmin/-/issues/17715)