upvalues() on non-suspended thunk

#261
Opened by qyliss at 2023-03-20T11·13+00

depot commit: 49cb2a2b1f964e07d2b6919c36ef0c4fbcd34443
Nixpkgs commit: d3840956451bc76c9e81134b2af21844ddac0ef3

% RUST_BACKTRACE=1 target/debug/tvix -E '(import ~/src/nixpkgs {}).linux' -I nix=../../nix/src/libexpr
thread 'main' panicked at 'upvalues() on non-suspended thunk: Evaluated(String(NixString("6.1.20")))', eval/src/value/thunk.rs:269:22
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: tvix_eval::value::thunk::Thunk::upvalues_mut::{{closure}}
             at ./eval/src/value/thunk.rs:269:22
   3: core::cell::RefMut<T>::map
             at /build/rustc-1.67.1-src/library/core/src/cell.rs:1535:35
   4: tvix_eval::value::thunk::Thunk::upvalues_mut
             at ./eval/src/value/thunk.rs:261:9
   5: tvix_eval::value::thunk::Thunk::finalise
             at ./eval/src/value/thunk.rs:213:9
   6: tvix_eval::vm::VM::execute_bytecode
             at ./eval/src/vm/mod.rs:726:48
   7: tvix_eval::vm::VM::execute
             at ./eval/src/vm/mod.rs:338:27
   8: tvix_eval::vm::run_lambda
             at ./eval/src/vm/mod.rs:1224:5
   9: tvix_eval::Evaluation::evaluate
             at ./eval/src/lib.rs:259:25
  10: tvix::interpret
             at ./cli/src/main.rs:88:9
  11: tvix::main
             at ./cli/src/main.rs:159:13
  12: core::ops::function::FnOnce::call_once
             at /build/rustc-1.67.1-src/library/core/src/ops/function.rs:507:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

  1. qyliss updated the body of this issue at 2023-03-20T11·13+00
  2. qyliss updated the body of this issue at 2023-03-20T11·14+00
  3. I now know of a total of two reproducers which seem to work over wide ranges of nixpkgs revisions. I used 897876e4c484f1e8f92009fd11b7d988a121a4e7 and based my experiments on cl/8658 (patchset 1-5).

    It seems to be caused by some sort of miscompilation of formals, judging by the runtime trace.

    • pkgs.linux.outPath seems to crash in these formals. My guess is that the thunk that causes the crash is lib.versions.pad 3 version, but in theory it could also be the version argument itself.
    • pkgs.pkgsCross.aarch64-multiplatform.stdenv.outPath seems to crash in these formals. The offending thunk is most likely cc != null. Interestingly it does not need to be evaluated at the point where it crashes, as hasCC is provided explicitly according to the runtime trace.

    sterni at 2023-05-29T18·58+00

  4. I found a very small reproducer. I've annotated it with what I think is happening to cause the crash. While it is kind of clear to me now what is going on, how to fix it, remains a bigger question—I would like some input on this.

    let
      # Wraps `x` in a thunk.
      thunk = x: x;
      # Tvix is going to crash while finalising this.
      bomb = thunk true;
      # The function which causes Tvix to emit the OpFinalise responsible for the crash.
      f =
        { # Since the default expr depends on `later`, it needs to be finalised.
            finalise ? later == null
          , later ? null
        }:
        [ finalise later ];
    in
    
    # This expression forces `bomb` using OpForce which is necessary, since the
    # crash doesn't happen if we use `builtins.seq`. We need to thunk the rhs of
    # the `||` to prevent the compiler from optimising.
    assert bomb || thunk true;
    
    # By providing finalise explicitly, OpFinalise will be called on the value of
    # `bomb` instead of the default expr which does need finalising. Because `bomb`
    # is a Thunk we have forced in this crude way, it will miss the following branch
    # in OpFinalise (vm/mod.rs) intended to work around this issue:
    #
    #     …
    #
    #     Value::Thunk(thunk) => thunk.finalise(&self.stack[frame.stack_offset..]),
    #
    #     // In functions with "formals" attributes, it is
    #     // possible for `OpFinalise` to be called on a
    #     // non-capturing value, in which case it is a no-op.
    #     //
    #     // TODO: detect this in some phase and skip the finalise; fail here
    #     _ => { /* TODO: panic here again to catch bugs */ }
    #
    #     …
    #
    # Instead we'll crash in upvalues_mut() because thunk is `Evaluated(Bool(true))`.
    f { finalise = bomb; }
    

    sterni at 2023-06-02T22·49+00

  5. For fixing this there are two philosophies:

    1. Pile more workarounds on top of OpFinalise: The problem would disappear if we'd make thunk.finalise() no-op if the thunk is already finalised. This is undesireable, though, as it hides miscompilations.
    2. Implement the TODO—compile formals in a way that skips the finalise if we don't need to fall back to the default expression.

    Option 2 is preferable, I'd say, but would require conceiving a new mechanism.

    sterni at 2023-06-02T22·59+00

  6. cl/8705

    sterni at 2023-06-20T10·44+00

  7. sterni closed this issue at 2023-06-20T10·44+00