//tvix/eval: encoding behaviour?

#189
Opened by sterni at 2022-09-14T09·55+00

This issue is to note a major (but maybe not very significant) difference between Tvix and C++ Nix at the moment, so we can discuss it and establish some sort of understanding on how to deal with it.

  1. Maybe interesting: https://blog.burntsushi.net/bstr/

    sterni at 2022-09-14T14·18+00

  2. Add

    tvix-repl> builtins.substring 0 1 "👩🏽‍❤️‍💋‍👨🏽"
    thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside '👩' (bytes 0..4) of `👩🏽‍❤️‍💋‍👨🏽`', library/core/src/str/mod.rs:127:5
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    

    to the list of problems.

    sterni at 2022-09-21T19·42+00

  3. For the last one, we can probably use something like https://docs.rs/substring/1.4.5/substring/index.html for reasonable behaviour.

    However, doing that in Nix yields ... nothing? It's a bit unclear.

    tazjin at 2022-09-23T00·28+00

  4. It yields a string of length 1 containing only the first byte of that emoji in UTF-8 encoding (depending on the locale of course). It is probably non-printable.

    sterni at 2023-05-30T21·58+00

  5. https://b.tvl.fyi/issues/337 - this is load-bearing for evaluating eg nixpkgs.hello it seems

    grfn at 2023-12-05T22·07+00

  6. cl/10200 has started work on converting NixString to use byte vectors

    grfn at 2023-12-05T23·03+00

  7. aspen closed this issue at 2024-01-31T14·52+00