Auto-deploy / self-healing mechanism for whitby (b/109 AI)
#110
Opened by tazjin at
For auto-deploy, I propose something like:
- we add something like //ops/machine-state.nix, this file maps machines to commit hashes of depot
- we add a timer on whitby that automatically builds & deploys whitby at the specified commit hash
- deploying whitby becomes a two-step process: land a commit that introduces the desired change, land a second one that moves whitby forward
- the timer script runs in two phases, two: a first phase which executes nix-diff and notifies IRC of a pending change with a link to the output, a second phase which deploys whitby and also notifies IRC
For emergencies / development workflows:
- users can stop the timer unit to prevent automatic deploys
- stopping the timer unit notifies IRC (somehow?), as does restarting it
- after some time the timer starts again
- tazjin updated the body of this issue at 2021-04-09T19·02+00
I think we should also have some sort of post deploy smoke tests that trigger an automatic rollback - could start out with something as simple as failed systemd units and go from there
aspen at 2021-05-22T18·05+00
I'm going to start working on this
aspen at 2021-05-23T15·29+00
Start of a script is up at cl/3145, pausing there for review
aspen at 2021-05-23T16·58+00