whitby: dbus session sockets are refusing connection / user units failing
When deploying whitby today, I ran into this issue during rebuild-system
when switching to the new NixOS system:
… setting up /etc... restarting systemd... reloading user units for sterni... Error: Failed to open dbus connection Caused by: Unable to autolaunch a dbus-daemon without a $DISPLAY for X11 reloading user units for flokli... Error: Failed to open dbus connection Caused by: Failed to connect to socket /run/user/1017/bus: Connection refused reloading user units for tazjin... Error: Failed to open dbus connection Caused by: Failed to connect to socket /run/user/1001/bus: Connection refused reloading user units for qyliss... Error: Failed to open dbus connection Caused by: Failed to connect to socket /run/user/1008/bus: Connection refused …
This also presents as user@.service
units failing, e.g.:
× user@1016.service - User Manager for UID 1016 Loaded: loaded (/etc/systemd/system/user@.service; static) Drop-In: /nix/store/mcd6p1l8vbajy097kidwbys6ch7srp3i-system-units/user@.service.d └─overrides.conf Active: failed (Result: exit-code) since Tue 2024-11-26 14:39:31 UTC; 3min 51s ago Invocation: c1caf8da976e4510aeff1f57082a4579 Docs: man:user@.service(5) Process: 567682 ExecStart=/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/systemd --user (code=exited, status=1/FAILURE) Main PID: 567682 (code=exited, status=1/FAILURE) Error: 49 (Protocol driver not attached) IP: 0B in, 0B out IO: 0B read, 0B written Mem peak: 3.4M CPU: 14ms Nov 26 14:39:31 whitby systemd[1]: Starting User Manager for UID 1016... Nov 26 14:39:31 whitby (systemd)[567682]: pam_unix(systemd-user:session): session opened for user sterni(uid=1016) by (uid=0) Nov 26 14:39:31 whitby (systemd)[567682]: pam_systemd(systemd-user:session): Failed to create session: Invalid session class manager Nov 26 14:39:31 whitby systemd[567682]: Trying to run as user instance, but $XDG_RUNTIME_DIR is not set. Nov 26 14:39:31 whitby (sd-pam)[567688]: pam_unix(systemd-user:session): session closed for user sterni Nov 26 14:39:31 whitby systemd[1]: user@1016.service: Main process exited, code=exited, status=1/FAILURE Nov 26 14:39:31 whitby systemd[1]: user@1016.service: Failed with result 'exit-code'. Nov 26 14:39:31 whitby systemd[1]: Failed to start User Manager for UID 1016.
I'm not sure what actually causes this and what impact it has in practice. I don't think we have any user units. All other services seem to have come up as normal.
I had the same or similar issue on //users/sterni/machines:ingeborgSystem
where I tried fixing this by killing all sessions via loginctl
for users with failing units, but to no avail. The issue ended up going away after a reboot (confirming tazjin's theory about systemd 🫠), but I never figured out what the cause was.
The dbus error message mentioning DISPLAY
is new to me (but maybe I've just forgotten).
I guess the dbus autolaunch errors are normal and just a symptom of the user systemd not coming up. Apparently, dbus uses DISPLAY to make sure only a single instance is started dynamically, but we would expect an instance is already running normally.
The XDG_RUNTIME_DIR error usually means that the session the systemd user daemon is using isn't configured properly or pam_systemd.so doesn't work correctly, but it looks normal and there weren't obvious changes upstream.
One thing we could attempt would be
systemctl daemon-reexec
and see whether it gets systemd in a proper state again. PID 1 is still systemd 253 probably, so maybe everything has just gotten into a weird state.sterni at 2024-11-29T15·00+00