Implement dynamic builder adding for Buildkite

#432
Opened by tazjin at 2025-01-03T16·22+00

My idea is roughly this:

  1. Configure our internal tailnet to have a builder tag, allowed to be added by a separate tvl-builders user.
  2. Create a separate cache signing key for builders, add a builder module which contains the buildkite setup logic and runs a builder cache listening on tailscale.
  3. Create pre-shared tailscale and buildkite keys, with which new builders can bootstrap. Prepare a base image for builders that can do this (initially for Yandex Cloud, but we can port it anywhere).
  4. Write a cache proxy that simultaneously checks all remote caches (based on the builder tag in Tailscale) for the given output, and redirects as appropriate.
  5. Ensure through Buildkite tags that the anchoring step always runs on whitby.

There's going to be a few hiccups for sure, but in general this should allow us to add/remove builders as required.

  1. tazjin updated the body of this issue at 2025-01-03T16·23+00
  2. tazjin updated the body of this issue at 2025-01-03T16·24+00
    1. cl/12948
    2. cl/12949

    The cache proxy is probably independently useful, because it can speed up the setup in which several caches are configured on one machine (as it looks up the narinfos in them separately). We could extend that to support priorities etc., but that is not needed for now.

    Next steps are to figure out what the setup should look like on each builder. All TVL machines trust the whitby cache signing key, and we can add whitby statically to the configuration on all dynamic builders.

    Whitby itself should have the cache proxy delegating to the builders. We'll add one small additional builder for starters to test this setup, though it should degrade gracefully if there are no additional builders (the proxy will just immediately 404).

    We also need a module for the cache proxy.

    Steps 2, 3, 5, should be doable already now, fwiw.

    tazjin at 2025-01-04T11·24+00