ce51d92df0
Follow-up hardening for348a02c, where `listen_addr` was added to Hello and HelloAck as `Option<SocketAddr>`. Code review surfaced three concrete problems that the previous commit left open: 1. Cold-start asymmetry. Discovery and the QUIC/mDNS advertiser are spawned concurrently. If discovery saw a cached peer advertisement before our own advertiser had written `ctx.local_peer_addr`, our outbound Hello carried `listen_addr: None`. The receiver's `peer_record_addr` then returned `None` and silently dropped the Hello while we still recorded their HelloAck, so peer A learned about peer B but B never learned about A until a later handshake happened to win the race. 2. Duplicate game-list pipeline. The previous commit added `refresh_peer_games`, which post-handshake issued a `ListGames` to fetch `peer.games`. The library-sync path (`LibrarySnapshot`) already populates the same field. Both could race on first contact and overwrite each other. Worse, `refresh_peer_games` was misnamed: a `peer_game_count > 0` guard turned it into a fetch-once-then-no-op helper, while `handle_library_summary` independently re-triggered a full handshake when `previous_count == 0` was observed, producing a redundant ping-pong on every first contact. 3. Argument explosion. `perform_handshake_with_peer`, `spawn_library_resync`, and `after_peer_library_recorded` had grown to 6-8 individual parameters and acquired `#[allow(clippy::too_many_arguments)]` opt-outs. Every caller was destructuring the same fields out of `Ctx`/`PeerCtx`. Changes (all in one commit because they jointly enforce the same invariant: "a peer is only ever recorded by its listener address, and the local listener address must exist before we participate in the protocol"): - `Hello.listen_addr` and `HelloAck.listen_addr` are now `SocketAddr`, not `Option<SocketAddr>`. Wire-incompatible, but PROTOCOL_VERSION already moved to 3 in348a02cso no additional version bump is needed. - `required_listen_addr` reads `ctx.local_peer_addr` and returns an `eyre::Result`; `build_hello_from_state` and `build_hello_ack` both call it, so an outbound or inbound Hello can no longer be constructed before the local QUIC listener is bound. The inbound path maps this into a `Response::InternalPeerError` so the remote peer fails cleanly instead of seeing a malformed HelloAck. - `run_peer_discovery` blocks on `wait_for_local_peer_addr` (25 ms poll, shutdown-aware) before subscribing to the mDNS browser. This closes the cold-start race for outbound handshakes at the source. - `refresh_peer_games`, `request_game_list_from_peer`, and the `previous_count == 0` re-handshake trigger are removed. The post-handshake flow now relies solely on `LibrarySummary`/`LibrarySnapshot`/`LibraryDelta` for peer-library state; `ListGames` survives only for the `request_game_details_*` paths that fetch per-game file descriptions on demand. - New `HandshakeCtx` (with `from_ctx` and `from_peer_ctx` constructors) replaces the long argument lists. All `too_many_arguments` allow-attrs in `handshake.rs` are gone, and call sites in `handlers.rs`, `discovery.rs`, and `stream.rs` collapse to a single clone. - `handle_library_delta` no longer acquires a read lock on the apply path: the `peer_addr` lookup moved into the `else` resync branch where it is actually needed. - `accept_inbound_hello`'s `remote_addr` parameter is renamed to `transport_addr`. It is now used only for warn-log formatting, and the new name signals that this is the ephemeral QUIC source port, never the authoritative listener address that gets recorded. User-visible effect: on cold start, peers can no longer end up with an asymmetric view of each other ("A sees B but B never sees A"). First-contact library sync now does one handshake plus one snapshot/delta exchange instead of the previous handshake + ListGames + redundant follow-up handshake. The direct-connect CLI path (`handle_connect_peer_command`) now fails fast with "local peer listener address is not ready" if invoked before the QUIC server has bound; this is intentional - the previous behaviour would have sent a Hello that the receiver had to silently discard. Test Plan: - just fmt - just clippy - just test (80 peer + 3 cli + 5 tauri tests pass) - just build - Manual: bring up `just peer-cli-alpha`/`bravo`/`charlie`, confirm symmetric peer discovery and that games show up on every side after one library digest cycle, with no duplicated ListGames traffic in trace logs. Refs: Review feedback on commit348a02c(listener-address handshake fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.2 KiB
8.2 KiB
lanspread-peer proposed protocol and architecture
This document proposes a tighter, more fault-tolerant protocol while keeping the current idea: mDNS discovery, QUIC transport, on-demand metadata, and chunked file transfers.
Goals (unchanged)
- Local LAN discovery via mDNS.
- QUIC + JSON messages for control, raw streams for file data.
- UI drives operations through
PeerCommand, peers remain headless. - Peers can appear/disappear at any time without data loss.
Peer lifecycle and message flow
1) Startup and advertise
- Start QUIC server.
- Advertise via mDNS with TXT records:
peer_id(stable ID, not tied to IP)proto_verlibrary_rev(monotonic local library revision)- optional
hostname
2) Discovery and handshake
When a peer is discovered:
- Connect and send
Hello { peer_id, proto_ver, listen_addr, library_rev, library_digest, features }.listen_addris mandatory; the QUIC source port is only a temporary transport port and must not be recorded as the peer's listener. - Receive
HelloAck { peer_id, proto_ver, listen_addr, library_rev, library_digest, features }. - If the remote
peer_idis already known but the address changed, update it. - If protocol versions are incompatible, drop the peer (and keep mDNS watching).
- If library digests match, do nothing else.
- If digests differ:
- If we have a known
library_revfor that peer, requestLibraryDelta. - Otherwise request
LibrarySnapshot.
- If we have a known
3) Steady state
- Any message updates
last_seen. - Pings run only when idle (or on a longer interval), not every 5 seconds.
- Library updates are pushed as deltas, debounced and coalesced.
4) Shutdown
- Optional
Goodbye { peer_id }lets others remove the peer quickly. - If a peer vanishes without goodbye, stale timeout + ping removal handle it.
- Goodbye is a hint, never required for correctness.
Library sync protocol
Summary and snapshot
LibrarySummary { peer_id, summary: { library_rev, library_digest, game_count } }LibrarySnapshot { peer_id, snapshot: { library_rev, games: Vec<GameSummary> } }
Delta updates
LibraryDelta { peer_id, delta: { from_rev, to_rev, added, updated, removed } }removedis a list of game IDs.- Deltas are idempotent; ignore if
to_rev<= known rev.
GameSummary (concept)
id,name,eti_version,size,downloaded,installedmanifest_hash(hash of file list + sizes)availability(e.g.,ready,downloading,local_only)
When peers broadcast their game list
- Only on changes, not on a timer.
- Filesystem events are gated per game ID instead of time-debounced:
- an active operation lock drops events for that game;
- a rescan already running for the ID sets a rescan-pending flag;
- the running rescan loops once more when that flag was set.
- Send
LibraryDeltato known peers; sendLibrarySummaryon new connections.
Local game scanning: fast and low cost
Strategy
- Maintain a persistent on-disk index (per game):
manifest_hash, total size, file list (optional), and a fingerprint (root-levelversion.inimtime, root-level.etimtime/size, andlocal/directory presence).
- Use filesystem watchers to update only changed games.
- Keep a 300-second fallback scan to recover from missed events.
Fast-path scanning
- On startup, list only top-level game directories.
- For each game, read a cheap fingerprint:
- root-level
.etifile names, sizes, and mtimes - root-level
version.inimtime - presence of
local/as a directory
- root-level
- If fingerprint unchanged, reuse cached size and manifest hash.
- Only run a recursive scan for new or changed games.
Local State and Recovery
Downloaded and installed are independent predicates:
downloadedis true only when<game_root>/version.iniexists as a regular file. The sentinel is written last through.version.ini.tmpand atomic rename. An interrupted replacement leaves no restored old sentinel because archive bytes may already have changed.installedis true when<game_root>/local/is a directory. The contents oflocal/are user-owned and are skipped by manifests, fingerprints, and file serving.
Reserved per-game paths:
.version.ini.tmpand.version.ini.discardedare download transaction scratch files and are swept during startup recovery..local.installing/is extraction staging..local.backup/holds the previous install while an update or uninstall is in flight..lanspread.jsonis the atomic per-game intent log..lanspread_ownedinside.local.*directories proves Lanspread ownership when the current intent isNone.
Recovery reads .lanspread.json and combines the recorded intent with the
observed local/, .local.installing/, and .local.backup/ state. Intent
states Installing, Updating, and Uninstalling prove ownership of the
corresponding reserved directories even if the marker was not flushed before a
crash. With intent None, markerless .local.* directories are left untouched.
Result
Most scans become O(number of game dirs), with full recursion only when needed.
File manifests and downloads
- Keep
GetGame/manifest requests, but keyed bymanifest_hashso repeated calls can be skipped when unchanged. - Downloads remain chunked QUIC streams with the existing integrity checks.
- A game is transferable only when its ID is in the catalog, no operation is
active for that ID, and the root-level
version.inisentinel exists. local/paths are never served, even if a stale or malicious manifest request asks for them.
Fault tolerance rules
- Every peer is keyed by
peer_id, not by IP address. - Peer addresses are listener addresses from mDNS or
Hello/HelloAck, never ephemeral QUIC source ports. library_revis monotonic and guards against out-of-order updates.- Any mismatch or missing delta falls back to
LibrarySnapshot. - Loss of goodbye is harmless; stale timeout is authoritative.
Roadmap from current design to this one
- Protocol updates in
lanspread-proto:- Define
Hello,HelloAck,LibrarySummary,LibrarySnapshot,LibraryDelta, and optionalGoodbyemessages. - Thread
peer_id,library_rev, andmanifest_hashthrough all library and manifest-bearing types. - Make
HelloandHelloAckcarry the sender'slisten_addr,library_rev, andlibrary_digestso both sides can record stable listener addresses and immediately selectLibraryDeltavsLibrarySnapshot.
- Define
- Peer identity:
- Persist a stable
peer_id(UUID) in the peer config and inject it intoPeerInfoandPeerGameDBat startup. - Track
peer_id -> SocketAddrin the discovery table and update the address on any incoming handshake or mDNS refresh.
- Persist a stable
- Discovery handshake:
- Publish
peer_idandlibrary_revin mDNS TXT records to avoid immediate TCP/QUIC roundtrips when nothing changed. - Add a lightweight handshake in
run_peer_discoverythat exchangesHello/HelloAckbefore any library sync. - Ignore peers that do not advertise the current protocol version.
- Publish
- Library revisioning:
- Store a monotonic
library_revlocally and increment only after a successful index refresh completes. - Apply
LibraryDeltawhenlibrary_revmatches; reject stale or future revisions and requestLibrarySnapshotinstead. - Cache the last accepted
manifest_hashper peer to short-circuit manifest requests when unchanged.
- Store a monotonic
- Local index + scan optimizations:
- Introduce a cached index file (e.g.,
.lanspread/index.json) that stores per-root fingerprints and computed manifests. - Use filesystem watchers with a debounce window to collect changes and incrementally update the cache.
- Schedule a low-frequency full scan to reconcile missed watcher events.
- Introduce a cached index file (e.g.,
- Announce updates:
- Broadcast
LibraryDeltaupdates keyed bylibrary_rev. - Send
LibrarySummaryon new connections to seed the delta flow.
- Broadcast
- File manifest caching:
- Store per-game
manifest_hashand only fetch details when changed.
- Store per-game
- Liveness:
- Reduce ping frequency; update
last_seenon any message. - Add optional
Goodbyeon shutdown paths.
- Reduce ping frequency; update
- Tests:
- Delta apply/merge, rev ordering, manifest hashing, and scan cache behavior.