3p/josh: unknown Rust issue is causing josh crashes
After the channel bump in cl/8855, executions of josh-filter
in depot push tasks (i.e. mirroring of repository parts to github) started failing with this:
Filtering depot through :workspace=views/tvix memory allocation of 94709859315041 bytes failed /nix/store/rxy8nsmlqh0pp09fkz5rpjccqp35nxnn--workspace=views-tvix-push: line 6: 3736962 Aborted (core dumped) josh-filter ':workspace=views/tvix'
An example build is this: https://buildkite.com/tvl/depot/builds/25562#0189166c-6b2a-4c3a-ab39-53d71f30d8bc
The issue occurs somewhere within josh when invoking gix, the Rust git implementation:
Thread 1 "josh-filter" received signal SIGABRT, Aborted. 0x00007ffff7c20a8c in __pthread_kill_implementation () from /nix/store/wpgrc564ys39vbyv0m50qxmq8dvhi7cc-glibc-2.37-8/lib/libc.so.6 (gdb) bt #0 0x00007ffff7c20a8c in __pthread_kill_implementation () from /nix/store/wpgrc564ys39vbyv0m50qxmq8dvhi7cc-glibc-2.37-8/lib/libc.so.6 #1 0x00007ffff7bd1c86 in raise () from /nix/store/wpgrc564ys39vbyv0m50qxmq8dvhi7cc-glibc-2.37-8/lib/libc.so.6 #2 0x00007ffff7bbb8ba in abort () from /nix/store/wpgrc564ys39vbyv0m50qxmq8dvhi7cc-glibc-2.37-8/lib/libc.so.6 #3 0x0000555555a69db7 in std::sys::unix::abort_internal::hf730997e9ccd1e2e () #4 0x00005555555e0b16 in std::process::abort::h5e94f7436e771820 () #5 0x0000555555a53f7b in std::alloc::rust_oom::h3e180efaa38ee02c () #6 0x0000555555a53f86 in __rg_oom () #7 0x000055555568eab6 in __rust_alloc_error_handler () #8 0x00005555556b3096 in alloc::alloc::handle_alloc_error::rt_error::h08f5902a73aba60c () #9 0x00005555555b24d6 in alloc::alloc::handle_alloc_error::h5b1e66e89806f984 () #10 0x00005555556b2082 in <&str as alloc::ffi::c_str::CString::new::SpecNewImpl>::spec_new_impl::ha745a009218ad9d4 () #11 0x00005555555df58b in std::sys::common::small_c_string::run_with_cstr_allocating::hb741e351232c3b55 () #12 0x00005555555e9ca4 in std::fs::metadata<&std::path::Path> (path=...) at /build/rustc-1.70.0-src/library/std/src/fs.rs:1847 #13 std::path::Path::metadata (self=...) at /build/rustc-1.70.0-src/library/std/src/path.rs:2727 #14 gix_discover::is::git<&std::path::PathBuf> (git_dir=<optimized out>) at /sources/gix-discover-0.18.1/src/is.rs:43 #15 0x00005555555eb53e in gix::types::ThreadSafeRepository::open_opts<&std::path::Path> (path=..., options=...) at /sources/gix-0.44.1/src/open/repository.rs:63 #16 gix::types::ThreadSafeRepository::open<&std::path::Path> (path=...) at /sources/gix-0.44.1/src/open/repository.rs:46 #17 0x0000555555685871 in josh_filter::run_filter (args=...) at josh-filter/src/bin/josh-filter.rs:155 #18 0x000055555568d165 in josh_filter::main () at josh-filter/src/bin/josh-filter.rs:445
The channel bump moved Rust from 1.69 to 1.70, in an experimental CL (cl/8917) we have seen that moving back to Rust 1.69 causes the issue to disappear. This may be either a problem in Rust itself, or in one of the dependencies in the build closure of either Rust or Cargo.
Relevant other changes:
- cl/8916 - adding debug information to release builds of josh (we will keep this on, it aids with future debugging)
- cl/8909 - attempt at building against an older
libgit2
(did not address the problem)
Next step is to try and build with Rust 1.69 from the rust-overlay, and move that workaround into canon
for now while debugging this further.
The issue can be reproduced on whitby by acting as a buildkite-agent
user, for example:
sudo -u buildkite-agent-whitby-2 bash -c 'cd /var/lib/buildkite-agent-whitby-2/builds/whitby-2/tvl/depot && /nix/store/fjxq2lcg1qsydz4dfk3kz2fkz79bqlls-rust-workspace-unknown/bin/josh-filter ":/nix/nix-1p"'
We might want to bisect nixpkgs against this to see what is going on.
cl/8917 is submitted and provides a workaround by pinning the build to Rust 1.69.0, but does not solve the root cause yet
tazjin at 2023-07-02T16·40+00
cl/9590 bumped the rustc versions, and we didn't see these josh to crash anymore. This can be closed.
flokli at 2023-11-15T22·31+00
- flokli closed this issue at 2023-11-15T22·31+00