Replace the detached tokio::spawn pattern in the peer runtime with a
supervised model built on tokio_util's CancellationToken and TaskTracker.
Long-lived services and child tasks now have an explicit parent, a
cancellation path, and a join point. Tauri can request a clean shutdown
on app exit instead of leaking work into process termination.
Background
~~~~~~~~~~
start_peer() previously returned only a command sender. The four startup
services (QUIC server, mDNS discovery, peer liveness, local library
monitor) and their child tasks (ping workers, handshake jobs, download
workers, announcement fan-outs, connection/stream handlers) were spawned
with raw tokio::spawn and detached. Closing the command channel sent
Goodbye notifications but did not stop those services. The mDNS blocking
worker had no cancellation path at all. Active downloads were stored as
JoinHandle<()> and force-aborted, which could interrupt file writes
mid-chunk.
Supervisor
~~~~~~~~~~
The runtime now owns a CancellationToken and a TaskTracker, threaded
through Ctx and PeerCtx. Each long-lived service is spawned through a
small supervisor (spawn_supervised_service) that wraps the service in
catch_unwind and enforces an explicit SupervisionPolicy:
QuicServer: Required (fatal; cancels the runtime if it dies)
Discovery: Restart(5s) (matches the prior self-restart loop)
Liveness: Restart(5s)
LocalMonitor: BestEffort (logs and exits, no restart)
A Required failure emits a new RuntimeFailed { component, error } event
to the UI and cancels the runtime; the command loop and goodbye
notifications still run to completion. The Tauri layer forwards the
event as "peer-runtime-failed" so a future UI can surface it.
mDNS cancellation
~~~~~~~~~~~~~~~~~
MdnsBrowser previously blocked on receiver.recv() forever. It now
exposes next_service_timeout(Duration) returning an MdnsServicePoll
enum (Service/Timeout/Closed) via recv_timeout(). The discovery worker
polls at 250ms and checks the shutdown flag between ticks, so
cancellation reaches the blocking thread within one poll interval
instead of waiting for the next mDNS event.
Downloads
~~~~~~~~~
active_downloads is now HashMap<String, CancellationToken>. Each
download gets a child token of the runtime shutdown, checked at chunk
and peer-attempt boundaries (never inside file writes). When all peers
with a game disappear, liveness cancels the token and emits
DownloadGameFilesAllPeersGone; the download exits Ok(()) without
emitting a duplicate Failed event.
DownloadStateGuard (context.rs) is held inside the download task and
clears downloading_games + active_downloads on Drop, covering the happy
path, error returns, cancellation, and task abort. Drop falls back to
spawning the cleanup if write-lock contention prevents try_write.
Public API and Tauri integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
start_peer() now returns PeerRuntimeHandle exposing:
fn sender(&self) -> UnboundedSender<PeerCommand>
fn shutdown(&self)
async fn wait_stopped(&mut self)
The Tauri layer stores the handle in managed state and switches its
main loop from .run(ctx) to .build(ctx).run(|h, e| ...). On
RunEvent::Exit it calls handle.shutdown() and blocks up to 2s on
wait_stopped(), giving services time to cancel and Goodbye packets time
to flush over a healthy LAN while staying short enough not to delay
process exit noticeably on a dead network.
The command loop distinguishes graceful shutdown from unexpected
channel closure: if recv() returns None and shutdown.is_cancelled() is
set, the loop returns Ok(()) silently. Only an unexpected close (no
cancellation observed) still emits RuntimeFailed. This avoids a
spurious failure event on every normal app close.
User-visible behavior changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Closing the app no longer leaks services into process termination;
Goodbye notifications are reliably attempted before exit.
- Downloads cancel cleanly (between chunks) instead of force-aborting
mid-write.
- A new "peer-runtime-failed" Tauri event fires when a Required service
cannot recover. No frontend handler exists yet — that is a follow-up.
Tradeoffs
~~~~~~~~~
- Workspace tokio-util now requires the "rt" feature for TaskTracker.
- The mDNS worker still runs in spawn_blocking and may stay parked
briefly between 250ms polls — acceptable for a desktop app.
- The 2s shutdown timeout on app exit is a deliberate compromise.
Tests
~~~~~
New unit tests:
- DownloadStateGuard clears tracking on completion, cancellation, and
parent-task abort (context.rs).
- Required failure cancels the runtime and emits RuntimeFailed
(startup.rs).
- Restart policy restarts until shutdown is requested (startup.rs).
- PeerRuntimeHandle.shutdown() observable via wait_stopped()
(startup.rs).
- Peers-gone cancellation emits only PeersGone, no duplicate Failed
(services/liveness.rs).
Test plan
~~~~~~~~~
cargo test --workspace
cargo clippy --workspace --all-targets
Manual smoke test on two peers on the same LAN:
1. Start a download, verify chunks transfer.
2. Close the receiving app mid-download — verify the sending peer
logs a Goodbye, not a connection-reset error.
3. Stop the sending peer mid-download — verify the receiver emits
DownloadGameFilesAllPeersGone, not Failed.
Follow-ups
~~~~~~~~~~
- Frontend handler for "peer-runtime-failed".
- Consider exposing the runtime handle's stopped watch to the frontend
for a reconnecting indicator on Required failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lanspread-peer
lanspread-peer is the networking runtime that lets Lanspread nodes find each
other on the local network, exchange library metadata, and transfer game files.
It is designed to run headless – other crates (most notably
lanspread-tauri-deno-ts) embed it and drive it through a channel-based API.
Runtime Overview
start_peer(game_dir, tx_events, peer_game_db)boots the asynchronous runtime in the background and returns anUnboundedSender<PeerCommand>that the caller uses for control. The initial game directory is installed directly into the peer context, the local library scan is attempted before discovery starts, and the providedPeerGameDBremains shared so the UI layer can observe live peer metadata.PeerCommandrepresents the small control surface exposed to the UI layer:ListGames,GetGame,DownloadGameFiles, andSetGameDir.PeerEventenumerates everything the peer runtime reports back to the UI: library snapshots, download lifecycle updates, and peer membership changes.PeerGameDBcollects remote peer metadata. It aggregates discovered peers’Gamedefinitions, tracks the latest ETI version per title, and keeps the last seen list ofGameFileDescriptionentries for each peer.
Internally the peer runtime owns four long-lived tasks that run for the lifetime of the process:
- Server component (
run_server_component) – listens for QUIC connections, advertises via mDNS, and servesRequest::ListGames,Request::GetGame,Request::GetGameFileData, andRequest::GetGameFileChunkby reading from the local game directory. - Discovery loop (
run_peer_discovery) – uses thelanspread-mdnshelper to discover other peers. The blocking mDNS work is executed on a dedicated thread viatokio::task::spawn_blockingso that the Tokio runtime remains responsive. - Ping service (
run_ping_service) – periodically issues QUIC ping requests to keep peer liveness up to date and prunes stale entries fromPeerGameDB. - Local game monitor (
run_local_game_monitor) – periodically rescans the configured game directory and announces local library deltas to known peers.
scan_local_library maintains a lightweight on-disk index and produces both a
GameDB and protocol summaries. The resulting database is used to respond to
incoming metadata requests (Request::ListGames / Request::GetGame).
Networking and File Transfer
- Transport is handled by
s2n-quic; TLS cert/key material is compiled in from the repository root. - Protocol messages are JSON-encoded structures defined in
lanspread-proto::{Request, Response}. - File transfers stream raw bytes over dedicated bidirectional QUIC streams.
peer::send_game_file_datasends entire files, whilepeer::send_game_file_chunkservices ranged requests.
Download Pipeline
When the UI asks to download a game:
- The UI first issues
PeerCommand::GetGame. Each peer that still reports the game is queried viarequest_game_details_from_peer, and their file manifests are merged insidePeerGameDB. - Once the UI receives
PeerEvent::GotGameFiles, it forwards the selected file list back withPeerCommand::DownloadGameFiles. download_game_filesprepares the filesystem (creating directories and pre-sizing files where possible), emitsPeerEvent::DownloadGameFilesBegin, and builds a per-peer plan (build_peer_plans) that round-robins file chunks across the available peers that advertise the latest version.- Each plan is executed in its own task (
download_from_peer). Chunk requests use per-chunk QUIC streams and write into pre-created files. The chunk writer keeps existing data intact and only truncates when we intentionally fall back to a full file transfer, which prevents corruption when multiple peers fill different regions of the same file. - Failures are accumulated and retried (up to
MAX_RETRY_COUNT) viaretry_failed_chunks. If everything succeeds,PeerEvent::DownloadGameFilesFinishedis emitted; otherwise the UI receivesPeerEvent::DownloadGameFilesFailed.
Integration with lanspread-tauri-deno-ts
The Tauri application embeds this crate in
crates/lanspread-tauri-deno-ts/src-tauri/src/lib.rs:
LanSpreadStateholds onto the peer control channel, the latest aggregatedGameDB, per-game download state, and the user-selected game directory.- The Tauri commands (
request_games,install_game,update_game, andupdate_game_directory) translate UI actions intoPeerCommands. In particular,update_game_directoryvalidates the filesystem path before storing it, loads the bundled catalog on first use, kicks off the peer runtime on demand, and mirrors the installed/uninstalled state into the UI-facing database. - A background task consumes
PeerEvents and fans them out to the front-end via Tauri publish/subscribe events (games-list-updated,game-download-*,peer-*). Successful downloads trigger anunrarsidecar to unpack ETI archives and clean up the temporary backup folders that are created when updates begin. - When downloads fail the Tauri layer restores the on-disk backup, keeping the previous installation consistent even after partial transfers.
Security & Operational Notes
- All QUIC connections are TLS encrypted; the shipped certificates are suitable for local-network trust but should be rotated for production deployments.
- Peer discovery is restricted to the local link via mDNS.
- Long-running blocking mDNS calls are isolated on dedicated threads which keeps the async runtime responsive even when discovery takes a long time.
- File writes are chunk-safe: partial chunk downloads now open files without truncating existing data, avoiding the corruption that occurred previously when multiple peers collectively filled a file.
Known Limitations
PeerGameDBcurrently models the latest metadata that other peers advertise. If the UI needs to surface titles that only exist locally, additional merging with the locally scannedGameDBwill be required.- The download planner uses a simple round-robin and does not yet take per-peer throughput or failures into account when distributing work.
Refer to the source (particularly src/lib.rs) for the exact message shapes and
state machines.