tvix/eval: control flow can exit VM internals with empty call frame
There is a bug somewhere in the trampoline logic that can cause a situation in which the control flow passes to e.g. thunk-forcing (outside of the VM module), with an empty stack frame.
This causes a rare situation in which Tvix can crash because methods on the VM that expect a frame to exist are invoked e.g. in builtins or during thunk forcing.
I'm preparing a CL that makes this easy to cause, as I don't quite understand how it happens yet.
This can be reproduced at patchset 1 of cl/7752 like so:
tazjin@tverskoy /d/t/cli ((11b2bb68…))> env RUST_BACKTRACE=1 cargo run -- -E 'builtins.builtins' thread 'main' panicked at 'attempt to subtract with overflow', eval/src/vm.rs:230:22 stack backtrace: 0: rust_begin_unwind at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5 1: core::panicking::panic_fmt at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14 2: core::panicking::panic at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:115:5 3: tvix_eval::vm::VM::frame at /depot/tvix/eval/src/vm.rs:230:22 4: tvix_eval::vm::VM::chunk at /depot/tvix/eval/src/vm.rs:234:10 5: tvix_eval::vm::VM::current_span at /depot/tvix/eval/src/vm.rs:267:9 6: tvix_eval::vm::VM::emit_warning at /depot/tvix/eval/src/vm.rs:300:19 7: tvix_eval::compiler::prepare_globals::{{closure}}::{{closure}} at /depot/tvix/eval/src/compiler/mod.rs:1261:17 8: tvix_eval::value::thunk::Thunk::force_trampoline_self at /depot/tvix/eval/src/value/thunk.rs:206:37 9: tvix_eval::value::thunk::Thunk::force_trampoline at /depot/tvix/eval/src/value/thunk.rs:149:36 10: tvix_eval::value::thunk::Thunk::force_trampoline_self::{{closure}} at /depot/tvix/eval/src/value/thunk.rs:198:33 11: core::ops::function::FnOnce::call_once{{vtable.shim}} at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5 12: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/alloc/src/boxed.rs:1987:9 13: tvix_eval::vm::VM::trampoline_loop at /depot/tvix/eval/src/vm.rs:496:38 14: tvix_eval::vm::VM::enter_frame at /depot/tvix/eval/src/vm.rs:449:13 15: tvix_eval::vm::run_lambda at /depot/tvix/eval/src/vm.rs:1200:5 16: tvix_eval::Evaluation::evaluate at /depot/tvix/eval/src/lib.rs:249:25 17: tvix::interpret at ./src/main.rs:60:9 18: tvix::main at ./src/main.rs:95:13 19: core::ops::function::FnOnce::call_once at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
This occurs because the evaluation of
builtins.builtins
usesVM::emit_warning
, which needs the current frame to exist. However, at the point control passes to the thunk forcing logic, the VMs call frame is empty (which is an invalid state).tazjin at 2023-01-04T15·25+00
I think what should be happening is that the trampoline action that enters a frame should replace the current frame with the frame it wants to enter, and whatever is going on with
tryEval
needs to work a little differently. This is just a theory based on my current understanding of what's happening.tazjin at 2023-01-04T15·49+00
It seems like this can be reproduced by (trying to) evaluate the following (valid) Nix code:
x./nix/store/6z1jfnqqgyqr221zgbpm30v91yfj3r45-bash-5.1-p16.drv.
Yes, that exact string. Verbatim.
tazjin at 2023-01-09T16·40+00
This is fixed in cl/8104, but that will take some time to land.
tazjin at 2023-02-26T13·55+00
- tazjin closed this issue at 2023-03-14T09·19+00