nix: build user fix leads to occasional deadlock of some kind
#185
Opened by tazjin at
In our backported build user fix (i.e. wait for build users instead of hard-failing), we see occasional deadlocks in situations where the daemon gets many simultaneous builds from different clients.
This happens in CI for example on channel bumps.
Next steps:
- backporting https://github.com/NixOS/nix/pull/3577
- default (overrideable and extensive) build timeout, e.g. 1hr
[1:24:56 pm] <sterni> tazjin: like left better i think [1:27:04 pm] <sterni> tazjin: btw worrying this build got stuck for ever https://buildkite.com/tvl/depot/builds/14234#01818804-e8d5-40a7-9d1f-a14b357e0906 [1:27:10 pm] <sterni> not sure if nix or some inner build [1:27:13 pm] <sterni> we'll find out I guess [1:34:50 pm] <tazjin> sterni: that's a bug in the waiting for build users thing [1:35:02 pm] <tazjin> sterni: some internal mutex ends up poisoned somehow [1:35:15 pm] <sterni> 🤠👍 [1:35:17 pm] <tazjin> it's actually the same thing that happens in Nix >=2.4, so we backported that :p [1:35:43 pm] <sterni> is it fixed in some higher nix version? [1:35:46 pm] <tazjin> sterni: I was thinking we should probably have a default timeout for all our targets that can be overridden