ce51d92df0
Follow-up hardening for348a02c, where `listen_addr` was added to Hello and HelloAck as `Option<SocketAddr>`. Code review surfaced three concrete problems that the previous commit left open: 1. Cold-start asymmetry. Discovery and the QUIC/mDNS advertiser are spawned concurrently. If discovery saw a cached peer advertisement before our own advertiser had written `ctx.local_peer_addr`, our outbound Hello carried `listen_addr: None`. The receiver's `peer_record_addr` then returned `None` and silently dropped the Hello while we still recorded their HelloAck, so peer A learned about peer B but B never learned about A until a later handshake happened to win the race. 2. Duplicate game-list pipeline. The previous commit added `refresh_peer_games`, which post-handshake issued a `ListGames` to fetch `peer.games`. The library-sync path (`LibrarySnapshot`) already populates the same field. Both could race on first contact and overwrite each other. Worse, `refresh_peer_games` was misnamed: a `peer_game_count > 0` guard turned it into a fetch-once-then-no-op helper, while `handle_library_summary` independently re-triggered a full handshake when `previous_count == 0` was observed, producing a redundant ping-pong on every first contact. 3. Argument explosion. `perform_handshake_with_peer`, `spawn_library_resync`, and `after_peer_library_recorded` had grown to 6-8 individual parameters and acquired `#[allow(clippy::too_many_arguments)]` opt-outs. Every caller was destructuring the same fields out of `Ctx`/`PeerCtx`. Changes (all in one commit because they jointly enforce the same invariant: "a peer is only ever recorded by its listener address, and the local listener address must exist before we participate in the protocol"): - `Hello.listen_addr` and `HelloAck.listen_addr` are now `SocketAddr`, not `Option<SocketAddr>`. Wire-incompatible, but PROTOCOL_VERSION already moved to 3 in348a02cso no additional version bump is needed. - `required_listen_addr` reads `ctx.local_peer_addr` and returns an `eyre::Result`; `build_hello_from_state` and `build_hello_ack` both call it, so an outbound or inbound Hello can no longer be constructed before the local QUIC listener is bound. The inbound path maps this into a `Response::InternalPeerError` so the remote peer fails cleanly instead of seeing a malformed HelloAck. - `run_peer_discovery` blocks on `wait_for_local_peer_addr` (25 ms poll, shutdown-aware) before subscribing to the mDNS browser. This closes the cold-start race for outbound handshakes at the source. - `refresh_peer_games`, `request_game_list_from_peer`, and the `previous_count == 0` re-handshake trigger are removed. The post-handshake flow now relies solely on `LibrarySummary`/`LibrarySnapshot`/`LibraryDelta` for peer-library state; `ListGames` survives only for the `request_game_details_*` paths that fetch per-game file descriptions on demand. - New `HandshakeCtx` (with `from_ctx` and `from_peer_ctx` constructors) replaces the long argument lists. All `too_many_arguments` allow-attrs in `handshake.rs` are gone, and call sites in `handlers.rs`, `discovery.rs`, and `stream.rs` collapse to a single clone. - `handle_library_delta` no longer acquires a read lock on the apply path: the `peer_addr` lookup moved into the `else` resync branch where it is actually needed. - `accept_inbound_hello`'s `remote_addr` parameter is renamed to `transport_addr`. It is now used only for warn-log formatting, and the new name signals that this is the ephemeral QUIC source port, never the authoritative listener address that gets recorded. User-visible effect: on cold start, peers can no longer end up with an asymmetric view of each other ("A sees B but B never sees A"). First-contact library sync now does one handshake plus one snapshot/delta exchange instead of the previous handshake + ListGames + redundant follow-up handshake. The direct-connect CLI path (`handle_connect_peer_command`) now fails fast with "local peer listener address is not ready" if invoked before the QUIC server has bound; this is intentional - the previous behaviour would have sent a Hello that the receiver had to silently discard. Test Plan: - just fmt - just clippy - just test (80 peer + 3 cli + 5 tauri tests pass) - just build - Manual: bring up `just peer-cli-alpha`/`bravo`/`charlie`, confirm symmetric peer discovery and that games show up on every side after one library digest cycle, with no duplicated ListGames traffic in trace logs. Refs: Review feedback on commit348a02c(listener-address handshake fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
200 lines
8.2 KiB
Markdown
200 lines
8.2 KiB
Markdown
# lanspread-peer proposed protocol and architecture
|
|
|
|
This document proposes a tighter, more fault-tolerant protocol while keeping
|
|
the current idea: mDNS discovery, QUIC transport, on-demand metadata, and
|
|
chunked file transfers.
|
|
|
|
## Goals (unchanged)
|
|
|
|
- Local LAN discovery via mDNS.
|
|
- QUIC + JSON messages for control, raw streams for file data.
|
|
- UI drives operations through `PeerCommand`, peers remain headless.
|
|
- Peers can appear/disappear at any time without data loss.
|
|
|
|
## Peer lifecycle and message flow
|
|
|
|
### 1) Startup and advertise
|
|
|
|
- Start QUIC server.
|
|
- Advertise via mDNS with TXT records:
|
|
- `peer_id` (stable ID, not tied to IP)
|
|
- `proto_ver`
|
|
- `library_rev` (monotonic local library revision)
|
|
- optional `hostname`
|
|
|
|
### 2) Discovery and handshake
|
|
|
|
When a peer is discovered:
|
|
|
|
1. Connect and send `Hello { peer_id, proto_ver, listen_addr, library_rev,
|
|
library_digest, features }`. `listen_addr` is mandatory; the QUIC source port
|
|
is only a temporary transport port and must not be recorded as the peer's
|
|
listener.
|
|
2. Receive `HelloAck { peer_id, proto_ver, listen_addr, library_rev,
|
|
library_digest, features }`.
|
|
3. If the remote `peer_id` is already known but the address changed, update it.
|
|
4. If protocol versions are incompatible, drop the peer (and keep mDNS watching).
|
|
5. If library digests match, do nothing else.
|
|
6. If digests differ:
|
|
- If we have a known `library_rev` for that peer, request `LibraryDelta`.
|
|
- Otherwise request `LibrarySnapshot`.
|
|
|
|
### 3) Steady state
|
|
|
|
- Any message updates `last_seen`.
|
|
- Pings run only when idle (or on a longer interval), not every 5 seconds.
|
|
- Library updates are pushed as deltas, debounced and coalesced.
|
|
|
|
### 4) Shutdown
|
|
|
|
- Optional `Goodbye { peer_id }` lets others remove the peer quickly.
|
|
- If a peer vanishes without goodbye, stale timeout + ping removal handle it.
|
|
- Goodbye is a hint, never required for correctness.
|
|
|
|
## Library sync protocol
|
|
|
|
### Summary and snapshot
|
|
|
|
- `LibrarySummary { peer_id, summary: { library_rev, library_digest, game_count } }`
|
|
- `LibrarySnapshot { peer_id, snapshot: { library_rev, games: Vec<GameSummary> } }`
|
|
|
|
### Delta updates
|
|
|
|
- `LibraryDelta { peer_id, delta: { from_rev, to_rev, added, updated, removed } }`
|
|
- `removed` is a list of game IDs.
|
|
- Deltas are idempotent; ignore if `to_rev` <= known rev.
|
|
|
|
### GameSummary (concept)
|
|
|
|
- `id`, `name`, `eti_version`, `size`, `downloaded`, `installed`
|
|
- `manifest_hash` (hash of file list + sizes)
|
|
- `availability` (e.g., `ready`, `downloading`, `local_only`)
|
|
|
|
## When peers broadcast their game list
|
|
|
|
- Only on changes, not on a timer.
|
|
- Filesystem events are gated per game ID instead of time-debounced:
|
|
- an active operation lock drops events for that game;
|
|
- a rescan already running for the ID sets a rescan-pending flag;
|
|
- the running rescan loops once more when that flag was set.
|
|
- Send `LibraryDelta` to known peers; send `LibrarySummary` on new connections.
|
|
|
|
## Local game scanning: fast and low cost
|
|
|
|
### Strategy
|
|
|
|
1. Maintain a persistent on-disk index (per game):
|
|
- `manifest_hash`, total size, file list (optional), and a fingerprint
|
|
(root-level `version.ini` mtime, root-level `.eti` mtime/size, and
|
|
`local/` directory presence).
|
|
2. Use filesystem watchers to update only changed games.
|
|
3. Keep a 300-second fallback scan to recover from missed events.
|
|
|
|
### Fast-path scanning
|
|
|
|
- On startup, list only top-level game directories.
|
|
- For each game, read a cheap fingerprint:
|
|
- root-level `.eti` file names, sizes, and mtimes
|
|
- root-level `version.ini` mtime
|
|
- presence of `local/` as a directory
|
|
- If fingerprint unchanged, reuse cached size and manifest hash.
|
|
- Only run a recursive scan for new or changed games.
|
|
|
|
## Local State and Recovery
|
|
|
|
Downloaded and installed are independent predicates:
|
|
|
|
- `downloaded` is true only when `<game_root>/version.ini` exists as a regular
|
|
file. The sentinel is written last through `.version.ini.tmp` and atomic
|
|
rename. An interrupted replacement leaves no restored old sentinel because
|
|
archive bytes may already have changed.
|
|
- `installed` is true when `<game_root>/local/` is a directory. The contents of
|
|
`local/` are user-owned and are skipped by manifests, fingerprints, and file
|
|
serving.
|
|
|
|
Reserved per-game paths:
|
|
|
|
- `.version.ini.tmp` and `.version.ini.discarded` are download transaction
|
|
scratch files and are swept during startup recovery.
|
|
- `.local.installing/` is extraction staging.
|
|
- `.local.backup/` holds the previous install while an update or uninstall is in
|
|
flight.
|
|
- `.lanspread.json` is the atomic per-game intent log.
|
|
- `.lanspread_owned` inside `.local.*` directories proves Lanspread ownership
|
|
when the current intent is `None`.
|
|
|
|
Recovery reads `.lanspread.json` and combines the recorded intent with the
|
|
observed `local/`, `.local.installing/`, and `.local.backup/` state. Intent
|
|
states `Installing`, `Updating`, and `Uninstalling` prove ownership of the
|
|
corresponding reserved directories even if the marker was not flushed before a
|
|
crash. With intent `None`, markerless `.local.*` directories are left untouched.
|
|
|
|
### Result
|
|
|
|
Most scans become O(number of game dirs), with full recursion only when needed.
|
|
|
|
## File manifests and downloads
|
|
|
|
- Keep `GetGame`/manifest requests, but keyed by `manifest_hash` so repeated
|
|
calls can be skipped when unchanged.
|
|
- Downloads remain chunked QUIC streams with the existing integrity checks.
|
|
- A game is transferable only when its ID is in the catalog, no operation is
|
|
active for that ID, and the root-level `version.ini` sentinel exists.
|
|
- `local/` paths are never served, even if a stale or malicious manifest request
|
|
asks for them.
|
|
|
|
## Fault tolerance rules
|
|
|
|
- Every peer is keyed by `peer_id`, not by IP address.
|
|
- Peer addresses are listener addresses from mDNS or `Hello`/`HelloAck`, never
|
|
ephemeral QUIC source ports.
|
|
- `library_rev` is monotonic and guards against out-of-order updates.
|
|
- Any mismatch or missing delta falls back to `LibrarySnapshot`.
|
|
- Loss of goodbye is harmless; stale timeout is authoritative.
|
|
|
|
## Roadmap from current design to this one
|
|
|
|
1. Protocol updates in `lanspread-proto`:
|
|
- Define `Hello`, `HelloAck`, `LibrarySummary`, `LibrarySnapshot`,
|
|
`LibraryDelta`, and optional `Goodbye` messages.
|
|
- Thread `peer_id`, `library_rev`, and `manifest_hash` through all
|
|
library and manifest-bearing types.
|
|
- Make `Hello` and `HelloAck` carry the sender's `listen_addr`,
|
|
`library_rev`, and `library_digest` so both sides can record stable
|
|
listener addresses and immediately select `LibraryDelta` vs
|
|
`LibrarySnapshot`.
|
|
2. Peer identity:
|
|
- Persist a stable `peer_id` (UUID) in the peer config and inject it into
|
|
`PeerInfo` and `PeerGameDB` at startup.
|
|
- Track `peer_id -> SocketAddr` in the discovery table and update the
|
|
address on any incoming handshake or mDNS refresh.
|
|
3. Discovery handshake:
|
|
- Publish `peer_id` and `library_rev` in mDNS TXT records to avoid
|
|
immediate TCP/QUIC roundtrips when nothing changed.
|
|
- Add a lightweight handshake in `run_peer_discovery` that exchanges
|
|
`Hello`/`HelloAck` before any library sync.
|
|
- Ignore peers that do not advertise the current protocol version.
|
|
4. Library revisioning:
|
|
- Store a monotonic `library_rev` locally and increment only after a
|
|
successful index refresh completes.
|
|
- Apply `LibraryDelta` when `library_rev` matches; reject stale or future
|
|
revisions and request `LibrarySnapshot` instead.
|
|
- Cache the last accepted `manifest_hash` per peer to short-circuit
|
|
manifest requests when unchanged.
|
|
5. Local index + scan optimizations:
|
|
- Introduce a cached index file (e.g., `.lanspread/index.json`) that stores
|
|
per-root fingerprints and computed manifests.
|
|
- Use filesystem watchers with a debounce window to collect changes and
|
|
incrementally update the cache.
|
|
- Schedule a low-frequency full scan to reconcile missed watcher events.
|
|
6. Announce updates:
|
|
- Broadcast `LibraryDelta` updates keyed by `library_rev`.
|
|
- Send `LibrarySummary` on new connections to seed the delta flow.
|
|
7. File manifest caching:
|
|
- Store per-game `manifest_hash` and only fetch details when changed.
|
|
8. Liveness:
|
|
- Reduce ping frequency; update `last_seen` on any message.
|
|
- Add optional `Goodbye` on shutdown paths.
|
|
9. Tests:
|
|
- Delta apply/merge, rev ordering, manifest hashing, and scan cache behavior.
|