tvix/eval: more compact bytecode representation

#298
Opened by tazjin at 2023-09-08T21·44+00

The bytecode of tvix-eval currently is much larger than it needs to be, as some variants directly contain integer arguments. This makes the bytecode a LOT less cache efficient than it could be.

An initial attempt to fix this was made by me in cl/7973 and its children, in which I added an Op type:

pub union Op {
    op: OpCode,
    data: u8,
}

and removed the fields from variants of OpCode. However, there is no safe way of accessing this union type in Rust, and the resulting code is not very nice.

Yesterday during my talk, somebody pointed out the obvious thing during questions: There's no need for this union at all, simply store a Vec<u8>, use #[repr(u8)] on OpCode, and implement (memory-safe) conversion from u8 to OpCode. All other logic then changes basically the same as in my chain above, but without introducing memory unsafety.

I think right now this is probably not worth doing, but we'll want to do it at some point.

  1. Effective bytecode usage can go even further: if our bytecode is indeed _byte_code, we can use direct address offsets in instruction pointer and memmap bytecode files directly to tvix memory space. It will significantly speedup file operations because page tables and caching are managed by the OS.

    totikom at 2023-09-13T06·05+00