tvix/eval: control flow can exit VM internals with empty call frame

#238
Opened by tazjin at 2023-01-04T15·22+00

There is a bug somewhere in the trampoline logic that can cause a situation in which the control flow passes to e.g. thunk-forcing (outside of the VM module), with an empty stack frame.

This causes a rare situation in which Tvix can crash because methods on the VM that expect a frame to exist are invoked e.g. in builtins or during thunk forcing.

I'm preparing a CL that makes this easy to cause, as I don't quite understand how it happens yet.

  1. This can be reproduced at patchset 1 of cl/7752 like so:

    tazjin@tverskoy /d/t/cli ((11b2bb68…))> env RUST_BACKTRACE=1 cargo run -- -E 'builtins.builtins'
    thread 'main' panicked at 'attempt to subtract with overflow', eval/src/vm.rs:230:22
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
       1: core::panicking::panic_fmt
                 at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
       2: core::panicking::panic
                 at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:115:5
       3: tvix_eval::vm::VM::frame
                 at /depot/tvix/eval/src/vm.rs:230:22
       4: tvix_eval::vm::VM::chunk
                 at /depot/tvix/eval/src/vm.rs:234:10
       5: tvix_eval::vm::VM::current_span
                 at /depot/tvix/eval/src/vm.rs:267:9
       6: tvix_eval::vm::VM::emit_warning
                 at /depot/tvix/eval/src/vm.rs:300:19
       7: tvix_eval::compiler::prepare_globals::{{closure}}::{{closure}}
                 at /depot/tvix/eval/src/compiler/mod.rs:1261:17
       8: tvix_eval::value::thunk::Thunk::force_trampoline_self
                 at /depot/tvix/eval/src/value/thunk.rs:206:37
       9: tvix_eval::value::thunk::Thunk::force_trampoline
                 at /depot/tvix/eval/src/value/thunk.rs:149:36
      10: tvix_eval::value::thunk::Thunk::force_trampoline_self::{{closure}}
                 at /depot/tvix/eval/src/value/thunk.rs:198:33
      11: core::ops::function::FnOnce::call_once{{vtable.shim}}
                 at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
      12: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
                 at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/alloc/src/boxed.rs:1987:9
      13: tvix_eval::vm::VM::trampoline_loop
                 at /depot/tvix/eval/src/vm.rs:496:38
      14: tvix_eval::vm::VM::enter_frame
                 at /depot/tvix/eval/src/vm.rs:449:13
      15: tvix_eval::vm::run_lambda
                 at /depot/tvix/eval/src/vm.rs:1200:5
      16: tvix_eval::Evaluation::evaluate
                 at /depot/tvix/eval/src/lib.rs:249:25
      17: tvix::interpret
                 at ./src/main.rs:60:9
      18: tvix::main
                 at ./src/main.rs:95:13
      19: core::ops::function::FnOnce::call_once
                 at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    

    This occurs because the evaluation of builtins.builtins uses VM::emit_warning, which needs the current frame to exist. However, at the point control passes to the thunk forcing logic, the VMs call frame is empty (which is an invalid state).

    tazjin at 2023-01-04T15·25+00

  2. I think what should be happening is that the trampoline action that enters a frame should replace the current frame with the frame it wants to enter, and whatever is going on with tryEval needs to work a little differently. This is just a theory based on my current understanding of what's happening.

    tazjin at 2023-01-04T15·49+00

  3. It seems like this can be reproduced by (trying to) evaluate the following (valid) Nix code:

    x./nix/store/6z1jfnqqgyqr221zgbpm30v91yfj3r45-bash-5.1-p16.drv.
    

    Yes, that exact string. Verbatim.

    tazjin at 2023-01-09T16·40+00

  4. This is fixed in cl/8104, but that will take some time to land.

    tazjin at 2023-02-26T13·55+00

  5. tazjin closed this issue at 2023-03-14T09·19+00