whitby: dbus session sockets are refusing connection / user units failing

#427
Opened by sterni at 2024-11-26T14·48+00

When deploying whitby today, I ran into this issue during rebuild-system when switching to the new NixOS system:

setting up /etc...
restarting systemd...
reloading user units for sterni...
Error: Failed to open dbus connection

Caused by:
    Unable to autolaunch a dbus-daemon without a $DISPLAY for X11
reloading user units for flokli...
Error: Failed to open dbus connection

Caused by:
    Failed to connect to socket /run/user/1017/bus: Connection refused
reloading user units for tazjin...
Error: Failed to open dbus connection

Caused by:
    Failed to connect to socket /run/user/1001/bus: Connection refused
reloading user units for qyliss...
Error: Failed to open dbus connection

Caused by:
    Failed to connect to socket /run/user/1008/bus: Connection refused

This also presents as user@.service units failing, e.g.:

× user@1016.service - User Manager for UID 1016
     Loaded: loaded (/etc/systemd/system/user@.service; static)
    Drop-In: /nix/store/mcd6p1l8vbajy097kidwbys6ch7srp3i-system-units/user@.service.d
             └─overrides.conf
     Active: failed (Result: exit-code) since Tue 2024-11-26 14:39:31 UTC; 3min 51s ago
 Invocation: c1caf8da976e4510aeff1f57082a4579
       Docs: man:user@.service(5)
    Process: 567682 ExecStart=/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/systemd --user (code=exited, status=1/FAILURE)
   Main PID: 567682 (code=exited, status=1/FAILURE)
      Error: 49 (Protocol driver not attached)
         IP: 0B in, 0B out
         IO: 0B read, 0B written
   Mem peak: 3.4M
        CPU: 14ms

Nov 26 14:39:31 whitby systemd[1]: Starting User Manager for UID 1016...
Nov 26 14:39:31 whitby (systemd)[567682]: pam_unix(systemd-user:session): session opened for user sterni(uid=1016) by (uid=0)
Nov 26 14:39:31 whitby (systemd)[567682]: pam_systemd(systemd-user:session): Failed to create session: Invalid session class manager
Nov 26 14:39:31 whitby systemd[567682]: Trying to run as user instance, but $XDG_RUNTIME_DIR is not set.
Nov 26 14:39:31 whitby (sd-pam)[567688]: pam_unix(systemd-user:session): session closed for user sterni
Nov 26 14:39:31 whitby systemd[1]: user@1016.service: Main process exited, code=exited, status=1/FAILURE
Nov 26 14:39:31 whitby systemd[1]: user@1016.service: Failed with result 'exit-code'.
Nov 26 14:39:31 whitby systemd[1]: Failed to start User Manager for UID 1016.

I'm not sure what actually causes this and what impact it has in practice. I don't think we have any user units. All other services seem to have come up as normal.

I had the same or similar issue on //users/sterni/machines:ingeborgSystem where I tried fixing this by killing all sessions via loginctl for users with failing units, but to no avail. The issue ended up going away after a reboot (confirming tazjin's theory about systemd 🫠), but I never figured out what the cause was.

The dbus error message mentioning DISPLAY is new to me (but maybe I've just forgotten).

  1. I guess the dbus autolaunch errors are normal and just a symptom of the user systemd not coming up. Apparently, dbus uses DISPLAY to make sure only a single instance is started dynamically, but we would expect an instance is already running normally.

    The XDG_RUNTIME_DIR error usually means that the session the systemd user daemon is using isn't configured properly or pam_systemd.so doesn't work correctly, but it looks normal and there weren't obvious changes upstream.

    One thing we could attempt would be systemctl daemon-reexec and see whether it gets systemd in a proper state again. PID 1 is still systemd 253 probably, so maybe everything has just gotten into a weird state.

    sterni at 2024-11-29T15·00+00