//tvix/eval: encoding behaviour?
This issue is to note a major (but maybe not very significant) difference between Tvix and C++ Nix at the moment, so we can discuss it and establish some sort of understanding on how to deal with it.
Currently we use Rust's
String(or equivalent) to represent Nix strings which (philosophically) is a UTF-32 string, requiring the original on disk representation to have some kind of Unicode encoding. I think this choice is partially enforced upon us, since
rnix-parserseems to make the same assumption that all its input is Unicode-encoded . In C++ Nix, however, strings are C strings, i.e. they are byte sequences that forbid the use of the
NULbyte. The discrepancy is twofold: On the one hand, Tvix will not accept valid Nix programs (e.g. https://sterni.lv/tmp/ord-data.nix), on the other hand behaviour will differ for programs accepted by both programs, e.g. indexing into (many) strings behaves differently depending on whether you treat them as Unicode codepoint sequences or byte sequences.
Paths are another topic we need to be mindful of. We are set up to handle this well with Rust's
OsStringabstraction, but we currently require literals to be UTF-8 and there are many occasions where a path becomes a string and vice versa (
toStringon paths, the attribute set keys resulting from
builtins.readDir, …). POSIX paths are arbitrary byte sequences (without
NULbytes IIRC) and we should probably also keep the Windows case in mind, since someone will surely want to port Tvix in the long term.
Maybe interesting: https://blog.burntsushi.net/bstr/
sterni at 2022-09-14T14·18+00
tvix-repl> builtins.substring 0 1 "👩🏽❤️💋👨🏽" thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside '👩' (bytes 0..4) of `👩🏽❤️💋👨🏽`', library/core/src/str/mod.rs:127:5 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
to the list of problems.
sterni at 2022-09-21T19·42+00
For the last one, we can probably use something like https://docs.rs/substring/1.4.5/substring/index.html for reasonable behaviour.
However, doing that in Nix yields ... nothing? It's a bit unclear.
tazjin at 2022-09-23T00·28+00