c3800461a4
Cancelling an in-flight download via `PeerCommand::CancelDownload` previously
torn down the network transfer and cleared `active_downloads`, but left the
partial `.eti` archive(s) sitting in the game root forever. The next library
scan still picked up the half-written files as a "downloaded" game, and the
only escape was the `Remove files` action. This is the symmetric fix to
`62ceb06 feat(peer): remove downloaded game files safely`: the cancel path
must clean up after itself the same way an explicit remove does.
The fix introduces a dedicated `download/storage.rs` module that owns both the
existing pre-allocation step (`prepare_game_storage`, moved out of
`planning.rs` because pure file I/O has no business sitting next to chunk
planning) and a new `discard_cancelled_download` sweep. The orchestrator
calls the sweep at every cancellation exit point, immediately after
`rollback_version_ini_transaction` so the version sentinel transients are
gone before the bulk deletion runs.
The sweep deliberately preserves a known set of names so a cancelled update
of an installed game does not destroy user-extracted files:
- `local/` committed install directory
- `.local.installing/`,
`.local.backup/` in-flight install transaction state, needed by
`install::recover_game_root` on next startup
- `.lanspread.json` per-game install intent log
- `.softlan_game_installed` external softlan installer marker
- `.sync/` external sync tooling
Everything else under the game root (the `.eti` archives, any nested payload
directories, partial chunk files) is removed, and the game root itself is
removed if it ends up empty. The set matches `should_ignore_game_child` in
`services/local_monitor.rs` minus the version.ini transients (which the
rollback step removes itself just before the discard runs).
Tradeoff worth knowing: this does NOT restore the pre-update `version.ini`
sentinel. `begin_version_ini_transaction` parks the existing sentinel as
`.version.ini.discarded`, and `rollback_version_ini_transaction` deletes
that file rather than renaming it back. The user-visible consequence is
that cancelling a mid-flight update of an installed game leaves the local
install playable but no longer flagged as "downloaded" — the documented
"settles as local-only" behaviour now recorded in
`crates/lanspread-peer/ARCHITECTURE.md` and `README.md`. Restoring the
sentinel on cancel was considered, but it would mean a cancelled update
keeps advertising the OLD version as Ready, which is worse than the
current outcome.
Two unrelated correctness issues that surfaced while threading cancellation
through the orchestrator are bundled in here because they belong to the
same user-visible "Cancel button works" story:
1. `download_from_peer` now races `connect_to_peer` against
`cancel_token.cancelled()` (`download/transport.rs:314-322`). Previously
a cancel arriving while QUIC was still in its connect handshake had to
wait for the connect timeout to elapse before the cleanup could run.
2. The download task in `handlers.rs` now calls
`refresh_local_game_for_ending_operation` on every terminal branch —
success-without-install, install-handoff-failure, and the `Err(e)` /
cancel branch — before `end_download_operation` clears
`active_downloads`. Without this, the UI's settled snapshot on the
cancel path could lag behind the actual file system state because the
active-operation snapshot was cleared while the discard was still
running, leaving a brief window where the card showed the pre-cancel
state.
What this does NOT fix: a crash (process kill, power loss) during a
download still leaves orphan `.eti` files because `recover_download_transients`
in `install/transaction.rs` only sweeps the version.ini transients. Closing
that gap would mean calling the same discard from startup recovery for any
game root whose install intent is None and whose `version.ini` is absent.
Tracked in `FINDINGS.md` as a follow-up.
Test Plan:
- `just clippy && just test` — 102 unit tests pass, no new warnings.
- Two new storage tests:
- `discard_cancelled_download_removes_peer_owned_payload` exercises the
fresh-download cancel (no `local/`, root sweeps clean).
- `discard_cancelled_download_preserves_local_install_state` exercises
the update cancel (`local/`, `.lanspread.json`, `.local.backup/`
survive; `version.ini` and `.eti` go away).
- Manual GUI smoke (operator): start a fresh download of a multi-archive
game from a peer, click Cancel from the detail modal while the progress
bar is between 5% and 95%. Expect the game root to be empty (or absent)
afterwards and no orphan `.eti` files. Repeat against an installed game
by clicking Update, then Cancel mid-download; expect `local/` contents
intact and the card to drop back to Play (or Update if the newer-version
peer is still around).
- `lanspread-peer-cli` has no `cancel` command yet, so the headless
`PEER_CLI_SCENARIOS.md` matrix does not cover this end-to-end. Adding a
CLI cancel command + scenario is the natural follow-up.
Refs: 62ceb06 (feat(peer): remove downloaded game files safely)
Refs: b7df2de (fix(download): emit failure events on early-returns and update UI transition)
213 lines
9.0 KiB
Markdown
213 lines
9.0 KiB
Markdown
# lanspread-peer proposed protocol and architecture
|
|
|
|
This document proposes a tighter, more fault-tolerant protocol while keeping
|
|
the current idea: mDNS discovery, QUIC transport, on-demand metadata, and
|
|
chunked file transfers.
|
|
|
|
## Goals (unchanged)
|
|
|
|
- Local LAN discovery via mDNS.
|
|
- QUIC + JSON messages for control, raw streams for file data.
|
|
- UI drives operations through `PeerCommand`, peers remain headless.
|
|
- Peers can appear/disappear at any time without data loss.
|
|
|
|
## Peer lifecycle and message flow
|
|
|
|
### 1) Startup and advertise
|
|
|
|
- Start QUIC server.
|
|
- Advertise via mDNS with TXT records:
|
|
- `peer_id` (stable ID, not tied to IP)
|
|
- `proto_ver`
|
|
- `library_rev` (monotonic local library revision)
|
|
- optional `hostname`
|
|
|
|
### 2) Discovery and handshake
|
|
|
|
When a peer is discovered:
|
|
|
|
1. Connect and send `Hello { peer_id, proto_ver, listen_addr, library_rev,
|
|
library_digest, features }`. `listen_addr` is mandatory; the QUIC source port
|
|
is only a temporary transport port and must not be recorded as the peer's
|
|
listener.
|
|
2. Receive `HelloAck { peer_id, proto_ver, listen_addr, library_rev,
|
|
library_digest, features }`.
|
|
3. If the remote `peer_id` is already known but the address changed, update it.
|
|
4. If protocol versions are incompatible, drop the peer (and keep mDNS watching).
|
|
5. If library digests match, do nothing else.
|
|
6. If digests differ:
|
|
- If we have a known `library_rev` for that peer, request `LibraryDelta`.
|
|
- Otherwise request `LibrarySnapshot`.
|
|
|
|
### 3) Steady state
|
|
|
|
- Any message updates `last_seen`.
|
|
- Pings run only when idle (or on a longer interval), not every 5 seconds.
|
|
- Library updates are pushed as deltas, debounced and coalesced.
|
|
|
|
### 4) Shutdown
|
|
|
|
- Optional `Goodbye { peer_id }` lets others remove the peer quickly.
|
|
- If a peer vanishes without goodbye, stale timeout + ping removal handle it.
|
|
- Goodbye is a hint, never required for correctness.
|
|
|
|
## Library sync protocol
|
|
|
|
### Summary and snapshot
|
|
|
|
- `LibrarySummary { peer_id, summary: { library_rev, library_digest, game_count } }`
|
|
- `LibrarySnapshot { peer_id, snapshot: { library_rev, games: Vec<GameSummary> } }`
|
|
|
|
### Delta updates
|
|
|
|
- `LibraryDelta { peer_id, delta: { from_rev, to_rev, added, updated, removed } }`
|
|
- `removed` is a list of game IDs.
|
|
- Deltas are idempotent; ignore if `to_rev` <= known rev.
|
|
|
|
### GameSummary (concept)
|
|
|
|
- `id`, `name`, `eti_version`, `size`, `downloaded`, `installed`
|
|
- `manifest_hash` (hash of file list + sizes)
|
|
- `availability` (e.g., `ready`, `downloading`, `local_only`)
|
|
|
|
## When peers broadcast their game list
|
|
|
|
- Only on changes, not on a timer.
|
|
- Filesystem events are gated per game ID instead of time-debounced:
|
|
- an active operation lock drops events for that game;
|
|
- a rescan already running for the ID sets a rescan-pending flag;
|
|
- the running rescan loops once more when that flag was set.
|
|
- Local library scans emit `LocalLibraryChanged` only for real library changes,
|
|
except that accepted game-directory changes can force a UI snapshot for the
|
|
new path without sending a peer delta.
|
|
- Active operation mutations emit `ActiveOperationsChanged` from the mutation
|
|
path instead of riding on local library scans.
|
|
- Send `LibraryDelta` to known peers; send `LibrarySummary` on new connections.
|
|
|
|
## Local game scanning: fast and low cost
|
|
|
|
### Strategy
|
|
|
|
1. Maintain a persistent on-disk index (per game):
|
|
- `manifest_hash`, total size, file list (optional), and a fingerprint
|
|
(root-level `version.ini` mtime, root-level `.eti` mtime/size, and
|
|
`local/` directory presence).
|
|
2. Use filesystem watchers to update only changed games.
|
|
3. Keep a 300-second fallback scan to recover from missed events.
|
|
|
|
### Fast-path scanning
|
|
|
|
- On startup, list only top-level game directories.
|
|
- For each game, read a cheap fingerprint:
|
|
- root-level `.eti` file names, sizes, and mtimes
|
|
- root-level `version.ini` mtime
|
|
- presence of `local/` as a directory
|
|
- If fingerprint unchanged, reuse cached size and manifest hash.
|
|
- Only run a recursive scan for new or changed games.
|
|
|
|
## Local State and Recovery
|
|
|
|
Downloaded and installed are independent predicates:
|
|
|
|
- `downloaded` is true only when `<game_root>/version.ini` exists as a regular
|
|
file. The sentinel is written last through `.version.ini.tmp` and atomic
|
|
rename. An interrupted replacement leaves no restored old sentinel because
|
|
archive bytes may already have changed.
|
|
- `installed` is true when `<game_root>/local/` is a directory. The contents of
|
|
`local/` are user-owned and are skipped by manifests, fingerprints, and file
|
|
serving.
|
|
|
|
Reserved per-game paths:
|
|
|
|
- `.version.ini.tmp` and `.version.ini.discarded` are download transaction
|
|
scratch files and are swept during startup recovery.
|
|
- `.local.installing/` is extraction staging.
|
|
- `.local.backup/` holds the previous install while an update or uninstall is in
|
|
flight.
|
|
- `.lanspread.json` is the atomic per-game intent log.
|
|
- `.lanspread_owned` inside `.local.*` directories proves Lanspread ownership
|
|
when the current intent is `None`.
|
|
|
|
Downloaded-file removal is not an uninstall transaction. It removes the whole
|
|
game root only for a catalog ID that is a single direct child of the configured
|
|
game directory, has a regular root-level `version.ini`, and has no `local/`,
|
|
`.local.installing/`, or `.local.backup/` path.
|
|
|
|
Recovery reads `.lanspread.json` and combines the recorded intent with the
|
|
observed `local/`, `.local.installing/`, and `.local.backup/` state. Intent
|
|
states `Installing`, `Updating`, and `Uninstalling` prove ownership of the
|
|
corresponding reserved directories even if the marker was not flushed before a
|
|
crash. With intent `None`, markerless `.local.*` directories are left untouched.
|
|
|
|
### Result
|
|
|
|
Most scans become O(number of game dirs), with full recursion only when needed.
|
|
|
|
## File manifests and downloads
|
|
|
|
- Keep `GetGame`/manifest requests, but keyed by `manifest_hash` so repeated
|
|
calls can be skipped when unchanged.
|
|
- Downloads remain chunked QUIC streams with the existing integrity checks.
|
|
- A game is transferable only when its ID is in the catalog, no operation is
|
|
active for that ID, and the root-level `version.ini` sentinel exists.
|
|
- `local/` paths are never served, even if a stale or malicious manifest request
|
|
asks for them.
|
|
- Cancelling a download discards the peer-owned root download payload and
|
|
scratch sentinel files. `local/` and install transaction metadata are
|
|
preserved, so a cancelled update of an installed game settles as local-only.
|
|
|
|
## Fault tolerance rules
|
|
|
|
- Every peer is keyed by `peer_id`, not by IP address.
|
|
- Peer addresses are listener addresses from mDNS or `Hello`/`HelloAck`, never
|
|
ephemeral QUIC source ports.
|
|
- `library_rev` is monotonic and guards against out-of-order updates.
|
|
- Any mismatch or missing delta falls back to `LibrarySnapshot`.
|
|
- Loss of goodbye is harmless; stale timeout is authoritative.
|
|
|
|
## Roadmap from current design to this one
|
|
|
|
1. Protocol updates in `lanspread-proto`:
|
|
- Define `Hello`, `HelloAck`, `LibrarySummary`, `LibrarySnapshot`,
|
|
`LibraryDelta`, and optional `Goodbye` messages.
|
|
- Thread `peer_id`, `library_rev`, and `manifest_hash` through all
|
|
library and manifest-bearing types.
|
|
- Make `Hello` and `HelloAck` carry the sender's `listen_addr`,
|
|
`library_rev`, and `library_digest` so both sides can record stable
|
|
listener addresses and immediately select `LibraryDelta` vs
|
|
`LibrarySnapshot`.
|
|
2. Peer identity:
|
|
- Persist a stable `peer_id` (UUID) in the peer config and inject it into
|
|
`PeerInfo` and `PeerGameDB` at startup.
|
|
- Track `peer_id -> SocketAddr` in the discovery table and update the
|
|
address on any incoming handshake or mDNS refresh.
|
|
3. Discovery handshake:
|
|
- Publish `peer_id` and `library_rev` in mDNS TXT records to avoid
|
|
immediate TCP/QUIC roundtrips when nothing changed.
|
|
- Add a lightweight handshake in `run_peer_discovery` that exchanges
|
|
`Hello`/`HelloAck` before any library sync.
|
|
- Ignore peers that do not advertise the current protocol version.
|
|
4. Library revisioning:
|
|
- Store a monotonic `library_rev` locally and increment only after a
|
|
successful index refresh completes.
|
|
- Apply `LibraryDelta` when `library_rev` matches; reject stale or future
|
|
revisions and request `LibrarySnapshot` instead.
|
|
- Cache the last accepted `manifest_hash` per peer to short-circuit
|
|
manifest requests when unchanged.
|
|
5. Local index + scan optimizations:
|
|
- Introduce a cached index file (e.g., `.lanspread/index.json`) that stores
|
|
per-root fingerprints and computed manifests.
|
|
- Use filesystem watchers with a debounce window to collect changes and
|
|
incrementally update the cache.
|
|
- Schedule a low-frequency full scan to reconcile missed watcher events.
|
|
6. Announce updates:
|
|
- Broadcast `LibraryDelta` updates keyed by `library_rev`.
|
|
- Send `LibrarySummary` on new connections to seed the delta flow.
|
|
7. File manifest caching:
|
|
- Store per-game `manifest_hash` and only fetch details when changed.
|
|
8. Liveness:
|
|
- Reduce ping frequency; update `last_seen` on any message.
|
|
- Add optional `Goodbye` on shutdown paths.
|
|
9. Tests:
|
|
- Delta apply/merge, rev ordering, manifest hashing, and scan cache behavior.
|