Cancelling an in-flight download via `PeerCommand::CancelDownload` previously
torn down the network transfer and cleared `active_downloads`, but left the
partial `.eti` archive(s) sitting in the game root forever. The next library
scan still picked up the half-written files as a "downloaded" game, and the
only escape was the `Remove files` action. This is the symmetric fix to
`62ceb06 feat(peer): remove downloaded game files safely`: the cancel path
must clean up after itself the same way an explicit remove does.
The fix introduces a dedicated `download/storage.rs` module that owns both the
existing pre-allocation step (`prepare_game_storage`, moved out of
`planning.rs` because pure file I/O has no business sitting next to chunk
planning) and a new `discard_cancelled_download` sweep. The orchestrator
calls the sweep at every cancellation exit point, immediately after
`rollback_version_ini_transaction` so the version sentinel transients are
gone before the bulk deletion runs.
The sweep deliberately preserves a known set of names so a cancelled update
of an installed game does not destroy user-extracted files:
- `local/` committed install directory
- `.local.installing/`,
`.local.backup/` in-flight install transaction state, needed by
`install::recover_game_root` on next startup
- `.lanspread.json` per-game install intent log
- `.softlan_game_installed` external softlan installer marker
- `.sync/` external sync tooling
Everything else under the game root (the `.eti` archives, any nested payload
directories, partial chunk files) is removed, and the game root itself is
removed if it ends up empty. The set matches `should_ignore_game_child` in
`services/local_monitor.rs` minus the version.ini transients (which the
rollback step removes itself just before the discard runs).
Tradeoff worth knowing: this does NOT restore the pre-update `version.ini`
sentinel. `begin_version_ini_transaction` parks the existing sentinel as
`.version.ini.discarded`, and `rollback_version_ini_transaction` deletes
that file rather than renaming it back. The user-visible consequence is
that cancelling a mid-flight update of an installed game leaves the local
install playable but no longer flagged as "downloaded" — the documented
"settles as local-only" behaviour now recorded in
`crates/lanspread-peer/ARCHITECTURE.md` and `README.md`. Restoring the
sentinel on cancel was considered, but it would mean a cancelled update
keeps advertising the OLD version as Ready, which is worse than the
current outcome.
Two unrelated correctness issues that surfaced while threading cancellation
through the orchestrator are bundled in here because they belong to the
same user-visible "Cancel button works" story:
1. `download_from_peer` now races `connect_to_peer` against
`cancel_token.cancelled()` (`download/transport.rs:314-322`). Previously
a cancel arriving while QUIC was still in its connect handshake had to
wait for the connect timeout to elapse before the cleanup could run.
2. The download task in `handlers.rs` now calls
`refresh_local_game_for_ending_operation` on every terminal branch —
success-without-install, install-handoff-failure, and the `Err(e)` /
cancel branch — before `end_download_operation` clears
`active_downloads`. Without this, the UI's settled snapshot on the
cancel path could lag behind the actual file system state because the
active-operation snapshot was cleared while the discard was still
running, leaving a brief window where the card showed the pre-cancel
state.
What this does NOT fix: a crash (process kill, power loss) during a
download still leaves orphan `.eti` files because `recover_download_transients`
in `install/transaction.rs` only sweeps the version.ini transients. Closing
that gap would mean calling the same discard from startup recovery for any
game root whose install intent is None and whose `version.ini` is absent.
Tracked in `FINDINGS.md` as a follow-up.
Test Plan:
- `just clippy && just test` — 102 unit tests pass, no new warnings.
- Two new storage tests:
- `discard_cancelled_download_removes_peer_owned_payload` exercises the
fresh-download cancel (no `local/`, root sweeps clean).
- `discard_cancelled_download_preserves_local_install_state` exercises
the update cancel (`local/`, `.lanspread.json`, `.local.backup/`
survive; `version.ini` and `.eti` go away).
- Manual GUI smoke (operator): start a fresh download of a multi-archive
game from a peer, click Cancel from the detail modal while the progress
bar is between 5% and 95%. Expect the game root to be empty (or absent)
afterwards and no orphan `.eti` files. Repeat against an installed game
by clicking Update, then Cancel mid-download; expect `local/` contents
intact and the card to drop back to Play (or Update if the newer-version
peer is still around).
- `lanspread-peer-cli` has no `cancel` command yet, so the headless
`PEER_CLI_SCENARIOS.md` matrix does not cover this end-to-end. Adding a
CLI cancel command + scenario is the natural follow-up.
Refs: 62ceb06 (feat(peer): remove downloaded game files safely)
Refs: b7df2de (fix(download): emit failure events on early-returns and update UI transition)
Replace the downloading action button with a dedicated progress component in
both card and detail views. The card now shows percent plus current speed, while
the detail modal shows bytes, speed, ETA, percent, and an inline cancel affordance
using the same backend progress payload.
Expose download cancellation as a peer command that cancels the tracked transfer
token and lets the running operation clear the authoritative active-operation
snapshot. Add a View Files action that resolves the game root safely and opens it
with the platform file viewer through Tauri's shell plugin.
Test Plan:
- just fmt
- just frontend-test
- just test
- just build
- just clippy
- git diff --cached --check
Refs: design reference e308009a08
Previously the action button only said "Downloading…" with no indication of
how far along the transfer was or how fast it was going. With multi-gigabyte
game payloads on a LAN this gave the user no signal whether the download had
stalled, was hitting the wire fast, or was about to finish.
Wire a sampled byte-level progress channel from the download pipeline up to
the action button:
- New `DownloadProgressTracker` in `crates/lanspread-peer/src/download/progress.rs`
holds the total expected bytes plus two atomic counters: `downloaded_bytes`
(deduplicated per `(relative_path, offset)` chunk key, used for the bar) and
`transferred_bytes` (raw cumulative, used for the speed sample). The dedup
prevents a retried chunk from double-counting toward completion while still
letting speed reflect actual wire activity including retry waste, which is
the more useful metric for "is the link doing anything right now?".
- `sample_download_progress` wraps the transfer future, emits an initial 0 B/s
snapshot, then samples on a 500 ms interval (`MissedTickBehavior::Skip` so a
stalled downloader does not generate a thundering herd of catch-up ticks)
and emits one final snapshot when the future resolves, so the UI sees the
closing state before `DownloadGameFilesFinished` arrives.
- New `PeerEvent::DownloadGameFilesProgress(DownloadProgress)` variant carries
`{ id, downloaded_bytes, total_bytes, bytes_per_second }`. The Tauri shell
forwards it as `game-download-progress`; the JSONL harness emits it as
`download-progress`.
- Orchestrator and retry paths refactored to thread a single shared
`Arc<DownloadProgressTracker>` through both the initial transfer and any
retry attempts. New `TransferContext`, `RetryContext`, and `ChunkPlanContext`
structs absorb the parameter-list growth that came with adding the tracker.
Frontend rendering honors the snapshot-is-authoritative decision from commit
`5df82aa` ("fix(ui): derive operation status from snapshots"):
- `Game.download_progress` is an ephemeral overlay carried alongside the card,
not a status field. `mergeGameUpdate` preserves it only while
`install_status === Downloading` and otherwise clears it on the next
snapshot, so the games-list snapshot remains the single authority for when
the bar should disappear.
- The `game-download-progress` listener writes ONLY `download_progress` — it
does not touch `install_status`, `status_message`, or `status_level`. This
preserves the rule that lifecycle events never mutate card status.
- No `game-download-finished` listener; snapshot reconciliation clears the
overlay automatically when status leaves Downloading.
- `ActionButton` renders a percentage fill behind the icon/label via a
`--download-progress` CSS custom property; the existing `.act-busy` spinner
is layered above the fill with `z-index: 1`. `act-downloading` widens the
button to avoid label jitter as the speed number changes (tabular-nums).
- `actionLabel` for the Downloading status now appends a formatted speed
("Downloading… 12.5 MB/s") via the new `formatBytesPerSecond` helper.
Test Plan:
- `just test` — Rust workspace tests including new progress tracker unit tests
(`tracker_counts_only_new_bytes_for_a_retried_chunk`,
`tracker_clamps_reported_bytes_to_total`).
- `just frontend-test` — Deno tests including
`download progress is preserved only while actively downloading` and
`downloading action label includes current speed`.
- `just clippy` — clean.
- Manual: download a multi-GB game from a peer and watch the action button
fill, speed update on the half-second, and reset cleanly on completion.
Refs: download progress visibility, snapshot-authoritative UI architecture
Centralize the bulk-transfer sizing in config.rs and bump the values used
on both ends of a QUIC connection:
- CHUNK_SIZE: 32 MiB -> 128 MiB
- QUIC_CONNECTION_DATA_WINDOW: 64 MiB -> 256 MiB
- QUIC_STREAM_DATA_WINDOW: 32 MiB -> 128 MiB
- QUIC_MAX_SEND_BUFFER_SIZE: 32 MiB -> 128 MiB
- QUIC_INITIAL_CONGESTION_WINDOW: 1 MiB -> 4 MiB
- FILE_TRANSFER_BUFFER_SIZE: 64 KiB -> 1 MiB (new constant)
The previous 32 MiB stream window was already comfortably above the
bandwidth-delay product of a sub-millisecond LAN at 2.5 GbE. The further
bump is deliberately generous: the goal is to push flow control and
per-syscall overhead far enough out of the way that they cannot be the
suspect when isolating the remaining LAN download bottleneck (disk, NIC,
or s2n-quic platform offload on the sending host). Memory pressure from
the larger windows is not observable on a desktop client moving GB-sized
games.
stream_file_bytes previously read the local file in 64 KiB chunks. At
multi-Gbit/s send rates that produced many thousands of disk reads per
second; bumping to 1 MiB keeps the per-file syscall load modest with no
measurable latency cost on streamed bulk transfers. The buffer size lives
in config.rs as FILE_TRANSFER_BUFFER_SIZE so it stays adjustable from one
place.
Also add a started/MiB-per-second log line at info level when a file
finishes streaming. This matches the S37 measurement methodology already
used in the peer-cli harness and makes per-file send throughput visible in
normal operation.
The peer-cli extended-scenarios harness uses CHUNK_SIZE as the tolerance
bound for chunk-boundary variance in its assertions, so its constant is
bumped to match. The multi-chunk planning unit test is rewritten to
reference CHUNK_SIZE symbolically (CHUNK_SIZE * 3 + CHUNK_SIZE / 2)
instead of a hardcoded 120 MiB; the previous literal would silently
degrade into a single-chunk test at the new chunk size and stop
exercising the spread-across-peers code path.
Test Plan:
- just fmt
- just clippy
- just test
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py S37 \
--build-image
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py S37
Refs: local LAN download performance investigation on 2026-05-20.
Depends-on: d7f7dc737e (QUIC UDP socket buffer sizing).
Configure the s2n-quic Tokio IO provider on both client and server instead of
using the address-only default provider. The configured provider asks the OS for
4 MiB send and receive buffers on each QUIC UDP socket, which avoids starting
bulk LAN transfers on the tiny default UDP buffer sizes.
I tested a wider version that also raised s2n-quic internal IO queues to 8 MiB,
but that regressed S37 to 710.19 and 736.20 MiB/s in repeat runs. This commit
keeps the narrower socket-buffer request, which measured faster than the prior
flow-control-only tuning while leaving the internal queue defaults intact.
The host used for measurement reports:
- net.core.rmem_max = 16777216
- net.core.wmem_max = 16777216
- net.core.rmem_default = 212992
- net.core.wmem_default = 212992
S37 single-source throughput:
- Step 1: 824.94 MiB/s, 6920.09 Mbit/s, 2.483s
- Step 2 sample A: 848.15 MiB/s, 7114.81 Mbit/s, 2.415s
- Step 2 sample B: 874.06 MiB/s, 7332.12 Mbit/s, 2.343s
Test Plan:
- just fmt
- sysctl net.core.rmem_max net.core.wmem_max net.core.rmem_default \
net.core.wmem_default
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py S37 --build-image
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py S37
Refs: local LAN download performance investigation on 2026-05-20.
Depends-on: cd8bcbfeedfa (QUIC flow-control and BBR tuning).
Raise the s2n-quic connection and stream data windows on both the client and
server, increase the max send buffer, and use BBR with a larger initial
congestion window. The download path was already able to pipeline multiple chunk
streams, but those streams still shared small default connection-level budgets
that limited sustained LAN throughput.
The tuning keeps one current wire protocol and does not add fallback behavior.
It is deliberately centralized in the peer networking module so later transport
changes can use the same limits on both sides of the connection.
S37 single-source throughput:
- Before: 733.22 MiB/s, 6150.72 Mbit/s, 2.793s
- After: 824.94 MiB/s, 6920.09 Mbit/s, 2.483s
- Delta: +91.72 MiB/s, +769.37 Mbit/s, about +12.5%
Test Plan:
- just fmt
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py S37 --build-image
Refs: local LAN download performance investigation on 2026-05-20.
Depends-on: 14e772c5c71a (peer-cli S37 throughput measurement).
Keep several chunk streams in flight per peer connection so a fast LAN download
is no longer forced through a request, wait, request loop. The transport still
uses the current GetGameFileChunk request on normal QUIC bidirectional streams,
so this improves throughput without adding another wire message or compatibility
path.
The peer planner now assigns chunks to the least-loaded eligible peer by planned
bytes. This keeps shared large files balanced across the latest valid sources,
while still respecting per-file source eligibility. Retries are batched by peer
and use the same pipelined transport instead of opening a new connection for one
failed chunk at a time.
Initial peer connection failures are converted into per-chunk failures so the
existing retry logic can move those chunks to another validated source. The dead
whole-file branch was removed from PeerDownloadPlan because nothing populated it
and retrying those entries as zero-length chunks would be a future data-loss
trap.
Test Plan:
- RUSTC_WRAPPER= just fmt
- RUSTC_WRAPPER= just test
- RUSTC_WRAPPER= just clippy
- RUSTC_WRAPPER= just peer-cli-build
- RUSTC_WRAPPER= just peer-cli-image
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py \
S13 S14 S16 S18 S19 S20 S24 S25 S26 S36
- git diff --cached --check
Refs: PEER_CLI_SCENARIOS.md
Review-Notes: addressed Claude review on whole-file retry cleanup
Install, update, uninstall, and downloaded-file removal used to clear the
active operation before publishing the settled local-library snapshot. That
allowed the UI bridge to emit a snapshot with no active operation but stale
local state, which could briefly make an installing game look not installed.
Refresh the ending game while its operation is still active, but exempt only
that game from the active-operation freeze. Other active games keep the
existing scan-preservation behavior. Lifecycle finished/failed events are now
emitted after the local snapshot and active-operation clear, so the status
snapshot remains the source of truth.
Test Plan:
- git diff --check
- just fmt
- just test
Refs: local install/download status snapshot cleanup
Address backend early-return paths that were silently exiting without emitting a
terminal event to the UI, and align the UI transition to "Downloading" with the
actual start of the chunk transfer.
- Added `DownloadGameFilesFailed` event emissions to `handlers.rs` in the
unhandled early-return branches (when resolved file descriptions are empty or
when no trusted peers are found without a local copy). This prevents the UI
from getting stuck in a checking state.
- Updated the frontend `'game-download-pre'` listener to keep the status in
`CheckingPeers` during peer majority size validation, and let the UI switch
to `Downloading` only upon `'game-download-begin'`.
- Added clarifying comments explaining the safety and semantic roles of both
listeners.
Test Plan:
- Run all unit tests to ensure no regressions: `just test`
- Compile and build the Tauri project: `just build`
Downloading a game could keep showing "Checking peers" while the backend was
already transferring files. The frontend owned a five-second fallback that could
invent a no-peers error during a valid long download, then return the action to
Download until install began.
Remove that frontend timer and make the peer lifecycle authoritative instead.
The UI now treats CheckingPeers as only an optimistic click response, ignores it
if a real operation is already in progress, and switches to Downloading when the
existing game-download-pre bridge reports that peer metadata was found.
A review found one backend path that previously had no terminal event: candidate
peers existed, but every peer detail request failed before GotGameFiles. That
path now emits DownloadGameFilesFailed so the UI can leave CheckingPeers without
falling back to a frontend guess.
Test Plan:
- just fmt
- just clippy
- just test
- just build
- git diff --check
Refs: local review P2
Downloaded but uninstalled games can still occupy significant disk space. Add a
separate removal path for that state instead of overloading uninstall, which is
reserved for deleting only `local/` installs.
The peer runtime now exposes `RemoveDownloadedGame` with matching lifecycle
and active-operation events. The filesystem delete is intentionally strict: the
id must be a catalog game and a single path component, the target must be a
direct child of the configured game directory, the root must not be a symlink,
it must have a regular root-level `version.ini`, and it must not contain
`local/`, `.local.installing/`, or `.local.backup/`. Only then do we recursively
remove the game root.
The Tauri bridge exposes this as `remove_downloaded_game`, the frontend shows a
matching danger action only for downloaded-but-uninstalled games, and a
confirmation dialog warns that re-downloading can take a long time.
Test Plan:
- git diff --check
- just fmt
- RUSTC_WRAPPER= CARGO_BUILD_RUSTC_WRAPPER= just test
- RUSTC_WRAPPER= CARGO_BUILD_RUSTC_WRAPPER= just clippy
- RUSTC_WRAPPER= CARGO_BUILD_RUSTC_WRAPPER= just build
Refs: user redesign nitpick about removing downloaded uninstalled games
Merge the S18-S36 scenario ideas into the official peer-cli scenario matrix
and add a Docker-backed runner that now exercises S1-S36 with concrete file
proofs. The runner creates temporary fixtures under .lanspread-peer-cli, drives
JSONL peer containers, checks transferred roots with diff and SHA-256 manifests,
and covers startup, discovery, transfer, failure, mutation, concurrency, mesh,
lifecycle, and catalog edge cases.
The scenarios exposed a few harness/runtime boundary gaps that would otherwise
make the contract ambiguous. The peer CLI now rejects self-connects, rejects
commands for game IDs outside the receiver catalog, filters unknown remote games
from its command/event surface, and reports duplicate active same-game commands
as operation-in-progress errors. The peer core also refuses non-catalog download
commands before transfer, and PeerGameDB has a unit check that address changes
preserve identity and library state.
S12 and S28 remain unit-level invariants because the CLI cannot stably race raw
serve-gate requests or rebind a live listener without restart. The runner treats
those scenarios as covered by just test and checks the expected unit test names
appear in the output.
Test Plan:
- just fmt
- python3 -m py_compile crates/lanspread-peer-cli/scripts/run_extended_scenarios.py
- RUSTC_WRAPPER= just test
- RUSTC_WRAPPER= just clippy
- RUSTC_WRAPPER= just peer-cli-build
- just peer-cli-image
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py
- git diff --check
Refs: PEER_CLI_SCENARIOS.md S1-S36
Replace the `a9f9845` local-update dedup cache with explicit peer event
semantics. Local scans now emit `LocalLibraryChanged` when the library changes,
while operation mutations emit `ActiveOperationsChanged` from the mutation
path. Tauri keeps joining those facts into the existing `games-list-updated`
payload, so the frontend contract stays stable.
This removes the cache/invalidation coupling between scan emission and
operation state. The remaining forced local snapshot is explicit: accepted game
directory changes can refresh the UI for an equivalent new path without sending
a peer library delta.
Operation guard cleanup and liveness cancellation now publish the same active
operation snapshot as normal command-handler transitions. The peer CLI JSONL
events follow the same split with `local-library-changed` and
`active-operations-changed`.
Test Plan:
- `just fmt`
- `CARGO_BUILD_RUSTC_WRAPPER= just test`
- `CARGO_BUILD_RUSTC_WRAPPER= just clippy`
- `git diff --check`
Refs: CLEAN_CODE_PLAN_1.md
Peer A failed to learn Peer B's games. The handshake only carried
library_rev/library_digest metadata, and the post-handshake sync path
compared those revisions against per-peer revision numbers that were
never advanced via this code path, so the games map for the remote
peer stayed empty and the UI never showed them.
The fix is to put the authoritative library data into the handshake
itself. Hello and HelloAck now carry a LibrarySnapshot directly, and
both perform_handshake_with_peer (outbound) and accept_inbound_hello
(inbound) apply that snapshot to the peer DB before emitting the UI
events. The initial peer-game-list event is now driven by the
handshake rather than by a follow-up LibrarySummary/LibrarySnapshot
roundtrip.
Bumps PROTOCOL_VERSION to 4 because the wire layout of Hello/HelloAck
changed. Per CLAUDE.md's protocol policy there is no compatibility
shim; older peers will fail the version check and be ignored.
Cleanups that fall out of the new design:
- The Hello / HelloAck library_rev and library_digest fields were
duplicated by the embedded LibrarySnapshot (which carries its own
library_rev, and whose digest is recomputed on apply). Collapsed
both messages to just `library: LibrarySnapshot` to remove the
foot-gun where the two could diverge.
- Request::LibrarySummary and Request::LibrarySnapshot are now dead
on the sender side and were removed along with their stream.rs
handlers and the LibrarySummary struct. LibraryDelta stays — it
is still sent from handlers.rs when the local library changes.
- record_remote_library previously called update_peer_library and
then apply_library_snapshot, which immediately overwrote the
rev/digest just written. Added update_peer_features and rewired
the call site so each peer-DB field is written exactly once.
update_peer_library is retained because discovery.rs still uses
it for the mDNS TXT-record path, where no snapshot is available.
- Removed the now-unused LibraryUpdate enum, select_library_update,
send_local_library_summary, send_local_library_update_if_needed,
LocalLibraryState::delta_since, build_library_summary,
send_library_summary, and send_library_snapshot.
Behavior change visible to users: when two peers come up on the LAN
they now see each other's full game lists immediately after the
handshake instead of waiting for a follow-up sync that, in the broken
case, never made the games visible at all.
Test Plan
- just clippy (clean for the touched crates)
- just test (workspace: all suites pass, including the two new
handshake tests: outbound_hello_carries_local_library_snapshot
and inbound_hello_applies_remote_library_snapshot, the latter
asserting PeerDiscovered + PeerCountUpdated + ListGames events
fire with the remote game visible)
- Manual: start `just peer-cli-alpha` and `just peer-cli-bravo` in
separate terminals; confirm each peer's game list shows the
other's library entries after discovery completes, without
requiring any additional command.
Refs
- FINDINGS.md: triage note that Claude's review surfaced only
in-scope cleanups (dead variants, duplicated header fields,
redundant DB writes, stale test fixture), all addressed here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up hardening for 348a02c, where `listen_addr` was added to Hello and
HelloAck as `Option<SocketAddr>`. Code review surfaced three concrete problems
that the previous commit left open:
1. Cold-start asymmetry. Discovery and the QUIC/mDNS advertiser are spawned
concurrently. If discovery saw a cached peer advertisement before our own
advertiser had written `ctx.local_peer_addr`, our outbound Hello carried
`listen_addr: None`. The receiver's `peer_record_addr` then returned `None`
and silently dropped the Hello while we still recorded their HelloAck, so
peer A learned about peer B but B never learned about A until a later
handshake happened to win the race.
2. Duplicate game-list pipeline. The previous commit added
`refresh_peer_games`, which post-handshake issued a `ListGames` to fetch
`peer.games`. The library-sync path (`LibrarySnapshot`) already populates
the same field. Both could race on first contact and overwrite each other.
Worse, `refresh_peer_games` was misnamed: a `peer_game_count > 0` guard
turned it into a fetch-once-then-no-op helper, while
`handle_library_summary` independently re-triggered a full handshake when
`previous_count == 0` was observed, producing a redundant ping-pong on
every first contact.
3. Argument explosion. `perform_handshake_with_peer`, `spawn_library_resync`,
and `after_peer_library_recorded` had grown to 6-8 individual parameters
and acquired `#[allow(clippy::too_many_arguments)]` opt-outs. Every caller
was destructuring the same fields out of `Ctx`/`PeerCtx`.
Changes (all in one commit because they jointly enforce the same invariant:
"a peer is only ever recorded by its listener address, and the local
listener address must exist before we participate in the protocol"):
- `Hello.listen_addr` and `HelloAck.listen_addr` are now `SocketAddr`, not
`Option<SocketAddr>`. Wire-incompatible, but PROTOCOL_VERSION already moved
to 3 in 348a02c so no additional version bump is needed.
- `required_listen_addr` reads `ctx.local_peer_addr` and returns an
`eyre::Result`; `build_hello_from_state` and `build_hello_ack` both call
it, so an outbound or inbound Hello can no longer be constructed before
the local QUIC listener is bound. The inbound path maps this into a
`Response::InternalPeerError` so the remote peer fails cleanly instead of
seeing a malformed HelloAck.
- `run_peer_discovery` blocks on `wait_for_local_peer_addr` (25 ms poll,
shutdown-aware) before subscribing to the mDNS browser. This closes the
cold-start race for outbound handshakes at the source.
- `refresh_peer_games`, `request_game_list_from_peer`, and the
`previous_count == 0` re-handshake trigger are removed. The post-handshake
flow now relies solely on `LibrarySummary`/`LibrarySnapshot`/`LibraryDelta`
for peer-library state; `ListGames` survives only for the
`request_game_details_*` paths that fetch per-game file descriptions on
demand.
- New `HandshakeCtx` (with `from_ctx` and `from_peer_ctx` constructors)
replaces the long argument lists. All `too_many_arguments` allow-attrs in
`handshake.rs` are gone, and call sites in `handlers.rs`, `discovery.rs`,
and `stream.rs` collapse to a single clone.
- `handle_library_delta` no longer acquires a read lock on the apply path:
the `peer_addr` lookup moved into the `else` resync branch where it is
actually needed.
- `accept_inbound_hello`'s `remote_addr` parameter is renamed to
`transport_addr`. It is now used only for warn-log formatting, and the
new name signals that this is the ephemeral QUIC source port, never the
authoritative listener address that gets recorded.
User-visible effect: on cold start, peers can no longer end up with an
asymmetric view of each other ("A sees B but B never sees A"). First-contact
library sync now does one handshake plus one snapshot/delta exchange instead
of the previous handshake + ListGames + redundant follow-up handshake. The
direct-connect CLI path (`handle_connect_peer_command`) now fails fast with
"local peer listener address is not ready" if invoked before the QUIC server
has bound; this is intentional - the previous behaviour would have sent a
Hello that the receiver had to silently discard.
Test Plan:
- just fmt
- just clippy
- just test (80 peer + 3 cli + 5 tauri tests pass)
- just build
- Manual: bring up `just peer-cli-alpha`/`bravo`/`charlie`, confirm symmetric
peer discovery and that games show up on every side after one library
digest cycle, with no duplicated ListGames traffic in trace logs.
Refs: Review feedback on commit 348a02c (listener-address handshake fix).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Peers discovered over mDNS could still attribute later library sync traffic to
temporary QUIC source ports. In a real GUI LAN run this made Host B try to
push its library to Host A's outbound port instead of Host A's advertised
listener, so Host A discovered the peer but never saw its games.
Carry the stable listener address in Hello and HelloAck, and key library sync
messages by peer_id instead of inferring identity from the transport source
address. The handshake path now explicitly refreshes an empty peer library from
the known listener address, matching the reliability of the direct-connect CLI
path without overwriting richer snapshot state when it already arrived.
This changes the current wire protocol, so PROTOCOL_VERSION is bumped to 3 and
all peers must be rebuilt together. The architecture note now documents that
listener addresses come from mDNS or Hello/HelloAck, never from ephemeral QUIC
source ports.
Test Plan:
- just fmt
- just test
- just clippy
- just build
- git diff --check
Refs: Local Linux/Win11 GUI LAN test logs from 2026-05-18.
Add deeper peer CLI coverage for file-transfer integrity and multi-peer
chunking. The alpha fixture now carries a real renamed RAR archive larger
than 100 MB for alienswarm, which gives the chunk planner enough work to
split a single game archive across multiple peers.
Expose completed chunk source details as a peer event and have the CLI print
that event as JSONL. This keeps transfer behavior in lanspread-peer while the
CLI remains a harness that reports what the peer runtime did. The Tauri shell
logs the event at debug level so the shared PeerEvent enum stays exhaustive.
Document the new S13/S14 scenarios and record the manual run evidence,
including SHA-256 manifests and the per-peer byte split for the large archive.
Test Plan:
- just fmt
- just test
- just peer-cli-build
- just clippy
- just peer-cli-image
- unrar t -idq crates/lanspread-peer-cli/fixtures/fixture-alpha/alienswarm/alienswarm.eti
- Manual peer CLI: bravo -> deep-small-client bfbc2 download with matching SHA-256 manifests
- Manual peer CLI: alpha -> deep-stage-b alienswarm download with matching SHA-256 manifests
- Manual peer CLI: alpha + deep-stage-b -> deep-stage-c alienswarm download with chunk events from both peers and matching SHA-256 manifests
Refs: PEER_CLI_SCENARIOS.md S13 S14
After renewing the dev certificate, peers could complete handshakes but then
lost each other during liveness checks. Inbound QUIC streams report the client's
ephemeral source port, while the peer database is supposed to track the peer's
advertised listening address. Recording the ephemeral address created unstable
peer entries that could not be pinged later.
Resolve transport source addresses back to the unique known peer on the same IP,
and keep an existing advertised address when an inbound Hello arrives from that
peer. Goodbye events now report the stored peer address as well.
This keeps the core peer behavior in lanspread-peer; the CLI only observes the
resulting peer snapshots.
Test Plan:
- just fmt
- just test
- just clippy
- just peer-cli-build
- just peer-cli-image
- just peer-cli-alpha, just peer-cli-bravo, just peer-cli-charlie
- list-peers after the ping idle window shows advertised peer addresses with
populated game lists instead of ephemeral-port peers disappearing
Refs: PEER_CLI_SCENARIOS.md
The peer CLI could flood LocalGamesUpdated events when run from the Docker
harness. The local monitor rescans game roots, and some bind-mounted filesystems
report those read/close operations back as notify access events. Treating those
non-mutating events as real library changes queued another rescan, making the
headless CLI unusable for manual peer-to-peer testing.
Ignore access events before mapping paths to game IDs. Create, modify, remove,
and rename events still flow through the existing per-game rescan gate, while
fallback scans continue to reconcile missed writes.
Test Plan:
- just fmt
- just test
- just clippy
Refs: manual peer-cli P2P testing
The peer runtime previously accepted an `enable_mdns: bool` flag, plumbed
through `PeerStartOptions`, `spawn_peer_runtime`, `run_peer`, `Ctx`, and
`PeerCtx`. The lanspread-peer-cli harness exposed the toggle as
`--no-mdns` so test scenarios could fall back to explicit `connect`
commands when mDNS could not be relied on, in particular when multiple
peers ran inside `--network host` containers and could not advertise
independently.
That host-networking workaround no longer exists: the previous commit
moves harness containers onto a macvlan network, where each peer is a
real LAN device and mDNS just works between them. There is no scenario
left in the codebase where disabling mDNS is desirable. Per the project's
protocol policy in CLAUDE.md ("there is only one wire version, no
compatibility shims, no fallback paths"), an opt-out path with no current
caller is exactly the kind of dead code we should not carry.
Remove the flag and every plumbing point that exists only to support it:
- `PeerStartOptions::enable_mdns` and the custom `Default` impl that set
it to `true`; the struct now derives `Default` and just carries
`state_dir`.
- The `enable_mdns` parameter on `start_peer_with_options`,
`spawn_peer_runtime`, `run_peer`, and `Ctx::new`.
- The `enable_mdns` fields on `Ctx` and `PeerCtx` and the propagation
through `to_peer_ctx`.
- The `if ctx.enable_mdns` guard in `spawn_startup_services`;
`spawn_peer_discovery_service` is now always spawned.
- The `if ctx.enable_mdns { ... } else { ... }` branch in
`run_server_component`: the mDNS advertiser and event monitor are now
unconditionally started, and the no-mDNS-fallback log line that read
"mDNS disabled; direct peer address is ..." is gone. The
`direct_connect_addr` helper is kept because the mDNS-on branch still
uses it as a fallback when `local_peer_addr` has not yet been
populated.
- The internal test helpers in `handlers.rs`, `services/local_monitor.rs`,
and `services/stream.rs` that passed `true` as the trailing
`enable_mdns` arg to `Ctx::new`.
- In `lanspread-peer-cli`: the `--no-mdns` arg parsing, the
`Args::enable_mdns` field, the `mdns` key on the `cli-started` event
payload, and the `--no-mdns` mention in the help text and the crate
README.
The `Args::name` field is wired to the harness identity but is otherwise
untouched. The macvlan network created by `just peer-cli-net` is the
runtime prerequisite for this change to be observable across containers;
on a single workstation, two harness binaries on `127.0.0.1` discover
each other through mDNS on the loopback interface as before.
Test Plan:
- `just fmt`
- `just clippy`
- `just test`
- `just peer-cli-build`
- Two peers on macvlan: `just peer-cli-run alpha` and
`just peer-cli-run beta`; check that each emits `peer-discovered` and
`peer-connected` events without an explicit `connect` JSONL command.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The follow-up backlog had drifted into three settled peer/runtime issues: the
legacy game-list fallback contradicted the one-wire-version policy, the Tauri
shell still re-derived local install state from disk after peer snapshots, and
`Availability::Downloading` existed even though active operations are already
reported through a separate operation table.
Remove the legacy `AnnounceGames` request and fallback service. Discovery now
ignores peers that do not advertise the current protocol and a peer id, and
library changes are sent through the current delta path only. This keeps the
runtime aligned with the documented current-build-only interoperability model.
Make peer `LocalGamesUpdated` snapshots authoritative for local fields in the
Tauri database. The GUI-side catalog still owns static metadata such as names,
sizes, and descriptions, but downloaded, installed, local version, and
availability now come from the peer runtime instead of a second whole-library
filesystem scan. Snapshot reconciliation also pins the missing-begin and
missing-finish lifecycle cases in tests.
Collapse availability back to the settled `Ready` and `LocalOnly` states.
Aggregation now counts only `Ready` peers as download sources, and the frontend
no longer carries a dead `Downloading` enum value.
The core peer also exposes the small non-GUI hooks needed by scripted callers:
startup options for state and mDNS, a local-ready event, direct connection, peer
snapshots, and an explicit post-download install policy. Those hooks reuse the
same current protocol path and do not add compatibility shims.
Test Plan:
- `git diff --check`
- `just fmt`
- `just clippy`
- `just test`
Refs: BACKLOG.md, FINDINGS.md, IMPL_DECISIONS.md
FINDINGS.md identified three merge blockers in the post-plan install/update
flow.
Updates now use FetchLatestFromPeers so the Tauri update command bypasses
local manifest serving and asks peers that advertise the latest version for
fresh file metadata. PeerGameDB now aggregates and validates file descriptions
from latest-version peers, keeping stale cached metadata for older versions
from poisoning chunk planning when filenames stay the same but sizes change.
Download-to-install handoff now performs explicit async state transitions.
The download task mutates Downloading to Installing or Updating under the
active-operation write lock, clears the cancellation token, and then runs the
install transaction. OperationGuard remains armed only as crash or abort
cleanup and is disarmed after normal explicit cleanup, so final refreshes no
longer race a deferred Drop cleanup.
Local library index writers now serialize the load/mutate/save window with one
async mutex. The index fingerprint also includes the root version.ini contents
so a same-length version rewrite in the same mtime second still updates the
reported local version.
The tradeoff is that local index mutations are serialized in-process instead
of moved into a dedicated actor. That keeps the fix small and scoped to the
merge blockers while preserving the existing scanner API.
Test Plan:
- just fmt
- just test
- just clippy
- just build
- git diff --check
Refs:
- FINDINGS.md
The download pipeline had grown into one large file that mixed sentinel
transaction handling, peer planning, transport, retry, and top-level
orchestration. Split it into a download/ module tree with one file per
concern so future lifecycle changes can be reviewed at the right boundary.
The public crate surface remains download::download_game_files. Helper types
and functions are kept pub(super) or private so the refactor does not widen
the API or encourage new callers to depend on internals. The version.ini
transaction helpers stay local to version_ini.rs; the proposed fs_util
extraction is intentionally left for the later atomic-index work, where a
second caller exists.
There is no intended runtime behavior change.
Test Plan:
- just fmt
- just test
- just clippy
- just build
Refs: none
Game::availability used string labels that were carried through persisted
library data, protocol summaries, and the Tauri-facing game payload. That
allowed invalid states to exist and required legacy summary conversion code to
defensively map strings back into protocol availability values.
Move Availability to lanspread-db and re-export it from lanspread-proto so the
persisted Game type and wire GameSummary type share one serde enum. The JSON
spelling stays Ready, Downloading, or LocalOnly, so the serialized shape does
not change for current library indexes or peer payloads.
Add typed helpers for sentinel-derived download state. Game::set_downloaded
keeps downloaded and Ready/LocalOnly in lockstep and intentionally collapses
non-ready local state, including Downloading, back to LocalOnly. That matches
the current local-summary contract where active operations are suppressed
instead of advertised as Downloading. Game::normalized_availability keeps the
legacy Game-to-summary path from publishing an inconsistent Ready value when
downloaded is false.
Update the follow-up status note so typed availability is no longer listed as
open work.
Test Plan:
- just fmt
- just test
- just clippy
- just build
Refs: none
The local library index used tokio::fs::write directly on the canonical
library_index.json path. That truncates the existing index before writing the
new bytes, so a crash or power loss could leave a zero-length or partial cache.
Write the index through a sibling temp file, sync it, rename it over the
canonical path, and sync the parent directory on Unix. Loading the index also
sweeps a stale temp file before parsing the canonical file. That keeps the
existing cache valid after an interrupted write while still letting a normal
scan rebuild from disk if the canonical index is missing or corrupt.
This follows the existing temp-plus-rename pattern used for version.ini and
install intents. It intentionally does not add locking; local library writes
are already serialized by the peer operation flow.
Test Plan:
just fmt
just test
just clippy
Refs: none
Move the repeated test TempDir implementations into a single peer
test_support module. The shared helper keeps the existing automatic cleanup
behavior and uses an atomic suffix plus timestamp so parallel tests do not
collide on the same path.
This is intentionally limited to test hygiene. It does not change the
availability model, split download.rs, or touch production scan/install
behavior beyond importing the shared helper from test modules.
Test Plan:
- git diff --check
- just fmt
- just clippy
- just test
Follow-up-Plan: FOLLOW_UP_2.md
Add focused serve-side tests for the gates around peer requests. GetGame now has
coverage for the non-catalog, active-operation, and missing-sentinel cases that
should return GameNotFound instead of exposing local files.
The full-file and chunk handlers both depend on the same transfer gate before
touching the QUIC send stream. Extract that gate into a small helper and test
the same cases there, plus the existing local-path exclusion, so both dispatch
paths stay aligned without adding fake QUIC stream plumbing.
Test Plan:
- git diff --check
- just fmt
- just clippy
- just test
Follow-up-Plan: FOLLOW_UP_2.md
Add dispatch-level tests for the local game monitor paths called out in
FOLLOW_UP_2.md. The new coverage verifies watcher events are dropped while a
game has an active operation, burst events for one game collapse through the
pending set to at most one extra rescan, fallback scans pick up sideloaded
catalog games, and non-catalog roots stay invisible to the library state.
The non-catalog test exposed that an empty local library initialized with
digest zero, while the computed digest for an empty map is nonzero. That made
the first empty scan produce a meaningless empty LibraryDelta. Initialize the
empty state with the computed empty digest so a non-catalog-only scan leaves no
delta behind.
Test Plan:
- git diff --check
- just fmt
- just clippy
- just test
Follow-up-Plan: FOLLOW_UP_2.md
Add a focused local-library test for the case where a game starts with only a
committed `local/` install and later gains a root `version.ini` sentinel. The
single-game rescan now has coverage showing it promotes the summary from
LocalOnly to Ready while preserving the installed flag and reading the local
version.
This pins the cached-index transition called out in FOLLOW_UP_2.md without
touching scanner dispatch or broader monitor behavior.
Test Plan:
- git diff --check
- just fmt
- just clippy
- just test
Follow-up-Plan: FOLLOW_UP_2.md
Add coverage for the uninstall branch where `local` has already been moved to
`.local.backup`, but deleting that backup fails. The Unix-gated test makes a
child directory non-writable before uninstall starts, so recursive deletion of
the renamed backup fails without adding production hooks.
The test verifies rollback restores the previous local install, removes the
backup path, and clears the intent. It is gated to Unix because deletion
permission behavior is platform-specific; Windows coverage would need a
different failure mechanism rather than pretending this setup is portable.
Test Plan:
- git diff --check
- just fmt
- just clippy
- just test
Follow-up-Plan: FOLLOW_UP_2.md
Add a focused transaction test for the branch where update extraction succeeds
but promoting `.local.installing` to `local` fails. The fake unpacker creates a
non-empty `local/` conflict after extraction, so the commit rename fails without
adding production hooks or brittle platform-specific permission tricks.
The assertion verifies the old install is restored from `.local.backup`, the
conflict and staging directories are removed, the backup is consumed, and the
intent is cleared back to None.
Test Plan:
- git diff --check
- just fmt
- just clippy
- just test
Follow-up-Plan: FOLLOW_UP_2.md
FOLLOW_UP_2.md called out that recovery only covered one intent-driven update
row. Replace that single-case assertion with a table over the ten recovery
rows documented in PLAN.md, spanning Installing, Updating, and Uninstalling
intents across local, staging, and backup directory states.
The cases intentionally use markerless reserved directories while an intent is
present. That pins the contract that the intent log proves Lanspread ownership
during crash recovery, including the crash windows before ownership markers are
dropped. The test still keeps the existing None-intent markerless case separate
so user-owned reserved names remain protected.
Running the larger table in parallel exposed that this module's TempDir helper
could collide on pid plus timestamp paths. Add a local atomic suffix so these
tests stop deleting each other's directories without doing the broader helper
consolidation reserved for the later hygiene phase.
Test Plan:
- git diff --check
- just fmt
- just clippy
- just test
Follow-up-Plan: FOLLOW_UP_2.md
Local operation spinners were driven by begin, finish, and failure event
history. If one of those lifecycle events was missed, the Tauri bridge could
keep a stale active operation and the React state would keep showing an
in-progress spinner until restart.
Peer local scan updates now carry an authoritative active-operation snapshot.
The peer still suppresses active game roots from peer-facing library deltas,
but it emits LocalGamesUpdated to the UI even when no library delta changed so
the snapshot can clear stale state after rollback or completion. The Tauri
bridge replaces its active-operation map from that snapshot, emits it with the
games-list payload, and the React merge uses it to restore download, install,
update, and uninstall spinners from current peer state rather than event
history alone.
This also enables the Tauri lib unit-test target so the reconciliation helper
can stay covered by the workspace test recipe.
Test Plan:
- git diff --check
- just fmt
- just clippy
- just test
Follow-up-Plan: FOLLOW_UP_2.md
The follow-up review found a few stale lifecycle edges around local game
transactions. Recovery could sweep active roots, post-operation refreshes
still re-ran full startup recovery, and the UI kept inferring local-only state
from downloaded and installed flags instead of the backend availability.
This updates the peer lifecycle so startup recovery skips active operations,
install/update/uninstall refresh only the affected game after the operation
guard is dropped, and path-changing game-directory updates are rejected while
operations are active. It also removes the dead UpdateGame command, drops the
unused manifest_hash write field while preserving old JSON reads, renames the
internal install-finished event, and carries availability through the DB,
peer summaries, Tauri refreshes, and the React model.
The included follow-up documents record the review source, implementation
decisions, and the remaining FOLLOW_UP_2.md work so later commits can stay
small instead of reopening the completed plan items.
Test Plan:
- git diff --check
- just fmt
- just clippy
- just test
Follow-up-Plan: FOLLOW_UP_PLAN.md
Update the peer README and architecture notes to match the landed runtime:
version.ini is the download sentinel, local/ is the install predicate, install
state is recovered through .lanspread.json intents, and watcher rescans are
operation-gated rather than time-debounced.
Add IMPL_DECISIONS.md with the implementation-time choices that were not
already prescribed by PLAN.md, including the just test recipe, the UI event
compatibility bridge, reuse of the existing library index for per-ID rescans,
and the split between active operation state and download cancellation tokens.
Test Plan:
- just fmt
- just clippy
- just test
- just build
Refs: PLAN.md
Implement the peer-owned state model from PLAN.md. A root-level version.ini
is now the download completion sentinel, local/ as a directory is the install
predicate, and exact root-level version.ini detection prevents nested files
from becoming sentinels by accident.
Add the peer operation table that gates downloads, installs, updates, and
uninstalls by game ID. Serving paths now reject non-catalog games, active
operations, missing sentinels, and any request that points under local/.
Remote aggregation treats LocalOnly peers as non-downloadable so they do not
contribute peer counts, candidate source selection, or latest-version checks.
Move install-side filesystem mutation into lanspread-peer::install. The new
module writes atomic .lanspread.json intents, uses .local.installing and
.local.backup with .lanspread_owned markers, and performs startup recovery
from recorded intent plus filesystem state. Downloads now buffer version.ini
chunks in memory and commit the sentinel last through .version.ini.tmp.
Replace the fixed 15-second monitor with notify-backed non-recursive watches,
per-ID rescan gating, and a 300-second fallback scan. The optimized rescan
path updates one cached library-index entry and active operation IDs preserve
their previous summary during scans.
Test Plan:
- just fmt
- just clippy
- just test
- just build
Refs: PLAN.md
Replace the detached tokio::spawn pattern in the peer runtime with a
supervised model built on tokio_util's CancellationToken and TaskTracker.
Long-lived services and child tasks now have an explicit parent, a
cancellation path, and a join point. Tauri can request a clean shutdown
on app exit instead of leaking work into process termination.
Background
~~~~~~~~~~
start_peer() previously returned only a command sender. The four startup
services (QUIC server, mDNS discovery, peer liveness, local library
monitor) and their child tasks (ping workers, handshake jobs, download
workers, announcement fan-outs, connection/stream handlers) were spawned
with raw tokio::spawn and detached. Closing the command channel sent
Goodbye notifications but did not stop those services. The mDNS blocking
worker had no cancellation path at all. Active downloads were stored as
JoinHandle<()> and force-aborted, which could interrupt file writes
mid-chunk.
Supervisor
~~~~~~~~~~
The runtime now owns a CancellationToken and a TaskTracker, threaded
through Ctx and PeerCtx. Each long-lived service is spawned through a
small supervisor (spawn_supervised_service) that wraps the service in
catch_unwind and enforces an explicit SupervisionPolicy:
QuicServer: Required (fatal; cancels the runtime if it dies)
Discovery: Restart(5s) (matches the prior self-restart loop)
Liveness: Restart(5s)
LocalMonitor: BestEffort (logs and exits, no restart)
A Required failure emits a new RuntimeFailed { component, error } event
to the UI and cancels the runtime; the command loop and goodbye
notifications still run to completion. The Tauri layer forwards the
event as "peer-runtime-failed" so a future UI can surface it.
mDNS cancellation
~~~~~~~~~~~~~~~~~
MdnsBrowser previously blocked on receiver.recv() forever. It now
exposes next_service_timeout(Duration) returning an MdnsServicePoll
enum (Service/Timeout/Closed) via recv_timeout(). The discovery worker
polls at 250ms and checks the shutdown flag between ticks, so
cancellation reaches the blocking thread within one poll interval
instead of waiting for the next mDNS event.
Downloads
~~~~~~~~~
active_downloads is now HashMap<String, CancellationToken>. Each
download gets a child token of the runtime shutdown, checked at chunk
and peer-attempt boundaries (never inside file writes). When all peers
with a game disappear, liveness cancels the token and emits
DownloadGameFilesAllPeersGone; the download exits Ok(()) without
emitting a duplicate Failed event.
DownloadStateGuard (context.rs) is held inside the download task and
clears downloading_games + active_downloads on Drop, covering the happy
path, error returns, cancellation, and task abort. Drop falls back to
spawning the cleanup if write-lock contention prevents try_write.
Public API and Tauri integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
start_peer() now returns PeerRuntimeHandle exposing:
fn sender(&self) -> UnboundedSender<PeerCommand>
fn shutdown(&self)
async fn wait_stopped(&mut self)
The Tauri layer stores the handle in managed state and switches its
main loop from .run(ctx) to .build(ctx).run(|h, e| ...). On
RunEvent::Exit it calls handle.shutdown() and blocks up to 2s on
wait_stopped(), giving services time to cancel and Goodbye packets time
to flush over a healthy LAN while staying short enough not to delay
process exit noticeably on a dead network.
The command loop distinguishes graceful shutdown from unexpected
channel closure: if recv() returns None and shutdown.is_cancelled() is
set, the loop returns Ok(()) silently. Only an unexpected close (no
cancellation observed) still emits RuntimeFailed. This avoids a
spurious failure event on every normal app close.
User-visible behavior changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Closing the app no longer leaks services into process termination;
Goodbye notifications are reliably attempted before exit.
- Downloads cancel cleanly (between chunks) instead of force-aborting
mid-write.
- A new "peer-runtime-failed" Tauri event fires when a Required service
cannot recover. No frontend handler exists yet — that is a follow-up.
Tradeoffs
~~~~~~~~~
- Workspace tokio-util now requires the "rt" feature for TaskTracker.
- The mDNS worker still runs in spawn_blocking and may stay parked
briefly between 250ms polls — acceptable for a desktop app.
- The 2s shutdown timeout on app exit is a deliberate compromise.
Tests
~~~~~
New unit tests:
- DownloadStateGuard clears tracking on completion, cancellation, and
parent-task abort (context.rs).
- Required failure cancels the runtime and emits RuntimeFailed
(startup.rs).
- Restart policy restarts until shutdown is requested (startup.rs).
- PeerRuntimeHandle.shutdown() observable via wait_stopped()
(startup.rs).
- Peers-gone cancellation emits only PeersGone, no duplicate Failed
(services/liveness.rs).
Test plan
~~~~~~~~~
cargo test --workspace
cargo clippy --workspace --all-targets
Manual smoke test on two peers on the same LAN:
1. Start a download, verify chunks transfer.
2. Close the receiving app mid-download — verify the sending peer
logs a Goodbye, not a connection-reset error.
3. Stop the sending peer mid-download — verify the receiver emits
DownloadGameFilesAllPeersGone, not Failed.
Follow-ups
~~~~~~~~~~
- Frontend handler for "peer-runtime-failed".
- Consider exposing the runtime handle's stopped watch to the frontend
for a reconnecting indicator on Required failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Peer startup used to bootstrap itself by spawning the runtime and immediately
sending a SetGameDir command back through its own control channel. The Tauri
integration then polled shared state until a directory appeared and waited two
seconds before asking peers for games. That made startup ordering implicit and
left a race-prone sleep in the UI bridge.
Install the initial game directory directly into the peer context instead. The
runtime now attempts the initial local-library scan before starting discovery,
then launches the server, discovery, liveness, and local monitor services from
that initialized context. Later directory changes still use SetGameDir, so the
existing UI command surface stays intact.
Use PathBuf and Path references across peer filesystem boundaries so directory
state is represented as a path rather than an optional string. The Tauri layer
now validates a selected game directory before storing it, loads the bundled
catalog on first use, and starts or updates the peer runtime from one helper.
Peer event fan-out is split into named handlers so the Tauri setup closure only
wires state and starts the event loop.
Shutdown goodbye notifications are still best-effort, but they are now awaited
with a short timeout instead of being spawned and forgotten. The tradeoff is a
small bounded wait during peer runtime shutdown in exchange for clearer task
ownership.
Test Plan:
- cargo test -p lanspread-peer
- cargo clippy
- cargo clippy --benches
- cargo clippy --tests
- cargo +nightly fmt
- git diff --check
Refs: none
The peer runtime used to spawn each long-running service inline inside
run_peer. That made the startup path harder to scan because service names,
clone setup, and task error handling were interleaved with the command loop.
Move the task wrappers into a startup module and leave run_peer as the
lifecycle overview: create shared context, start services, handle commands,
then send shutdown goodbyes. The spawned services and their error handling are
unchanged; only the ownership plumbing moved into named helpers.
Test Plan:
- cargo clippy
- cargo clippy --benches
- cargo clippy --tests
- cargo +nightly fmt
Refs: none