tvix-store: implement plain blob hashing

#379
Opened by flokli at 2024-02-19T15·16+00

Spun out of a discussion on IRC around implementing fetchurl.nix.

We currently have a CalculateNAR rpc call in tvix-store PathInfoService, accepting a root node, traversing the tree, rendering a NAR internally and returning the sha256 and size of that.

For things fetching plain paths (like fetchurl.nix in non-recursive mode), we need something similar for flat files.

It should receive the blake3 digest of a blob/file already uploaded into the BlobService, a desired hash function, and return the digest of it.

It should not live in BlobService directly, as that's blake3 only.

We can then use it for builtin:fetchurl, custom fetchers, as well as buitlins.path.

  1. flokli updated the body of this issue at 2024-02-19T15·30+00
  2. This could also introduce a fast path for builtins.hashFile - (or a more specialized variant of it?). For blobs that are in tvix-castore, we can just offload hashing work to this, and might be able to get a cached result.

    flokli at 2024-02-21T14·05+00

  3. https://cl.tvl.fyi/c/depot/+/10976

    aspen at 2024-02-21T14·48+00

  4. That cl is abandoned.

    The landed fetching code increments populates the sha256 state with the data it sees as it receives them over http. We could do something similar in builtins.path, if we detect we're in the plain mode.

    flokli at 2024-03-24T21·31+00

  5. That cl is abandoned.

    The landed fetching code increments populates the sha256 state with the data it sees as it receives them over http.

    We could do something similar in builtins.path, if we detect we're in the plain mode, at least with https://cl.tvl.fyi/c/depot/+/11236 .

    flokli at 2024-03-24T21·33+00

  6. I just abandoned cl/11004 with the following message:

    With all of the cleanups that happened around fetchers etc, there's luckily not really a reason to do this anymore, especially not bake it into the protocol.

    The only place where we still need to do this is when ingesting a single file in recursive mode, inside builtins.file, and we know that case upfront, so can arguably calculate the nar hash in-memory there, if we're concerned about the roundtrip.

    flokli at 2024-05-02T10·27+00

  7. flokli closed this issue at 2024-05-02T10·27+00