migrate from Sourcegraph to livegrep

#290
Opened by tazjin at 2023-07-24T20·47+00

Sourcegraph recently announced that it is now closed source software, and no longer freely available under open-source licensing. We will need to shut down our Sourcegraph eventually. While it still works at the moment, it's only a matter of time before something bitrots (as far as I know, it's already backchanneling ~something via Sourcegraph) or becomes insecure in some way.

There aren't many alternatives to Sourcegraph, and certainly no full-featured ones. There are several things that Sourcegraph does for us which we need to replace, here in my personal order of priorities:

  1. Code search. Like, actually typing words into a search box and getting results.

  2. Code exploration. Sourcegraph is more intuitive to navigate than cgit for most people, and makes it faster/more interactive to move around.

  3. Code intelligence in the code browser. Clickable symbols, finding references, all that good stuff.

These are also ordered by difficulty. Code search should be reasonably easy to get up and running, using something like livegrep.

Code exploration depends on how much we're willing to patch cgit. Maybe it's time to start that incremental RiiR ...

Code intelligence is interestingly potentially doable without patching cgit, as we already control its code rendering (through cheddar). If we could teach cheddar code intelligence, we might be able to hack something together.

Anyways, the most important thing is search, and it's what we have to sort out first.


Moving to livegrep

Builds & deployment

Building livegrep in Nix is impossible hard. It's a complex, multi-language Bazel project (including Node stuff with dependencies that fetch more dependencies at build time, etc.).

Ironically this is similar to Sourcegraph, which we actually never managed to successfully build in Nix.

I suggest that we just use the official Docker images for now.

Links

There's potentially a lot of links to cs.tvl.fyi spread around various places (tweets, orange website comments, IRC logs, ...).

We should have something that detects and rewrites these links on cs.tvl.fyi.

Some of the types of links we have:

There are probably more, but these we definitely need to map to something reasonable and redirect.

cgit integration

It would be nice if cgit had a search bar at the top that would send people to livegrep on enter. Depending on how livegrep URLs are constructed (I haven't looked yet), this might not be too hard to do.

  1. I played around with this today. Some notes:

    • livegrep indexes depot in ~2 seconds on my laptop
    • search is fast and doesn't take many resources
    • configuration is not that complex, but there's a few moving parts involved
    • livegrep lets users very simply override the HTML templates used for the page, we can add some TVL styling (though I'd keep the credits to the livegrep author!)

    I wrote an index.json that looks like this:

    {
        "name": "livegrep",
        "fs_paths": [
            {
                "name": "tvl/depot",
                "path": "/tvl/depot",
                "metadata": {
                    "url_pattern": "https://code.tvl.fyi/tree/{path}?id={version}#n{lno}"
                }
            }
        ],
        "repositories": [
            {
                "name": "depot",
                "path": "/tvl/depot",
                "revisions": [ "HEAD" ],
                "metadata": {
                    "url_pattern": "https://code.tvl.fyi/tree/{path}?id={version}#n{lno}",
                    "remote": "https://cl.tvl.fyi/depot.git"
                }
            }
        ]
    }
    

    This can then be invoked using livegrep-fetch-index, and it will clone the repo somewhere and generate a file livegrep.idx.

    This can then be served using the other commands. I added no special configuration to them, just leaving everything on defaults.

    Setup notes:

    • I ran it in Docker containers, because building this in Nix is unreasonably hard and - currently - a waste of time
    • There's a backend server and a frontend-server, they're independent but need to speak to each other. Only the frontend needs to be publicly available.
    • We can do the indexing in multiple different ways. We can run livegrep-fetch-index occasionally and let it do the needful, or we can use the gerrit depot replication thing and somehow invoke indexing manually on that (I didn't actually figure out how to do that yet). If we do the latter, we can do push-based updating with systemd path units. For starters just running fetch every few minutes is probably fine, though.

    tazjin at 2023-10-08T13·58+00

  2. Automated setup of livegrep: https://cl.tvl.fyi/c/depot/+/10936

    I think it's still missing some stuff for reindexing automatically, not entirely clear ...

    tazjin at 2024-02-17T07·03+00