tvix: reference scanning for store paths

#376
Opened by flokli at 2024-02-09T19·33+00

Whenever something is built, for all outputs, Nix checks all outputs for references to other store paths.

This is not anything looking like a store path, but specific scanning for store paths that were present during build - which means it's a subset of all input and output store paths.

We already have an appropriate reference scanner in glue/src/refscan.rs, initially used for our "reference scanning instead of string contexts" implementation, but still useful for this other usecase.

We need a function similar to this:

use nix_compat::store_path::StorePathRef;
use tvix_castore::proto as castorepb;
use std::io;
use std::collections::HashSet;

async fn calculate_references(blob_service: BS, directory_service: DS, root_node: &castorepb::node::Node, known_paths: KP) -> io::Result<HashSet<StorePath>>
where
KP: Iter<Item = StorePathRef>,
BS: AsRef<dyn BlobService> + Send + Sync + Clone + 'static,
DS: AsRef<dyn DirectoryService> + Send + Sync + Clone + 'static,
{
  // peek at the root node enum kind, in case it's a directory, recurse into each child
  // in case it's a blob, refscan its contents.
  // in case it's a symlink, refscan its target.
}

It could probably live in store/src/ (that's where we also have NAR import and export functions).

Contrary to NAR, where things are serial, we could do some concurrent refscanning (and union the sets of found references). Or just do it linearly to get started.

Later, after a build, the builder would invoke this for each produced root node (each output), with the list of known paths (but that's probably out of scope for this issue).