fix(peer): settle current-protocol local state cleanup

The follow-up backlog had drifted into three settled peer/runtime issues: the
legacy game-list fallback contradicted the one-wire-version policy, the Tauri
shell still re-derived local install state from disk after peer snapshots, and
`Availability::Downloading` existed even though active operations are already
reported through a separate operation table.

Remove the legacy `AnnounceGames` request and fallback service. Discovery now
ignores peers that do not advertise the current protocol and a peer id, and
library changes are sent through the current delta path only. This keeps the
runtime aligned with the documented current-build-only interoperability model.

Make peer `LocalGamesUpdated` snapshots authoritative for local fields in the
Tauri database. The GUI-side catalog still owns static metadata such as names,
sizes, and descriptions, but downloaded, installed, local version, and
availability now come from the peer runtime instead of a second whole-library
filesystem scan. Snapshot reconciliation also pins the missing-begin and
missing-finish lifecycle cases in tests.

Collapse availability back to the settled `Ready` and `LocalOnly` states.
Aggregation now counts only `Ready` peers as download sources, and the frontend
no longer carries a dead `Downloading` enum value.

The core peer also exposes the small non-GUI hooks needed by scripted callers:
startup options for state and mDNS, a local-ready event, direct connection, peer
snapshots, and an explicit post-download install policy. Those hooks reuse the
same current protocol path and do not add compatibility shims.

Test Plan:
- `git diff --check`
- `just fmt`
- `just clippy`
- `just test`

Refs: BACKLOG.md, FINDINGS.md, IMPL_DECISIONS.md
This commit is contained in:
2026-05-16 18:32:24 +02:00
parent 6242d64583
commit e711cf3454
23 changed files with 531 additions and 723 deletions
+15 -157
View File
@@ -1,161 +1,19 @@
# Findings — Bugs to Fix Before Merging
# Findings
Three bugs found in the post-PLAN.md implementation. Fix these, then merge.
Everything else lives in `BACKLOG.md` and does not block.
No open pre-merge findings are currently tracked here.
---
The previous three findings have landed in code and tests:
## 1. `update_game` never fetches a fresh manifest from peers
- `update_game` now uses `PeerCommand::FetchLatestFromPeers` to skip local
manifest serving and fetch fresh peer metadata. Covered by
`update_fetch_emits_fresh_manifest_from_latest_peer` and
`update_request_skips_local_manifest_even_when_download_exists`.
- Download-to-install handoff no longer relies on `OperationGuard::Drop` for
ordered state transitions. Covered by
`download_handoff_waits_for_readers_and_auto_installs` and the liveness
cancellation tests.
- Library index reads and writes are serialized by `LIBRARY_INDEX_LOCK`.
Covered by `concurrent_rescans_preserve_both_index_updates`.
PLAN.md:357 calls for `update_game` to send `GetGame`, *fetch fresh remote
archives*, and trigger an auto-install as a transactional update. The
implementation makes that path unreachable.
Trace:
1. Tauri `update_game` sends `PeerCommand::GetGame`
(`crates/lanspread-tauri-deno-ts/src-tauri/src/lib.rs:162`).
2. Peer's `handle_get_game_command` calls `try_serve_local_game` first
(`crates/lanspread-peer/src/handlers.rs:88`).
3. `try_serve_local_game` consults `local_download_available`, which returns
true whenever `version.ini` is present locally and the ID is in the catalog
(`crates/lanspread-peer/src/local_games.rs:48-66`). For any game the user
has already downloaded, this is *always* true.
4. The **local** file descriptions are returned via `GotGameFiles`. Tauri
routes those into `DownloadGameFiles`.
5. `handle_download_game_files_command:204-227` consults `peer_game_db` via
`validate_file_sizes_majority`, so cached remote metadata *is* read. But
the descriptions actually used for chunk planning are the local ones
(`handlers.rs:189-195`). When peers advertise a newer version with
different file sizes, the whitelist is empty and the path falls into
"instant install of local archives." When sizes happen to match, we
plan chunks against local descriptions and request those offsets from
peers — which works only when peer-side files are identical to local.
Either way, peers' current manifests are never read.
Net effect: "update" = re-extract whatever archives are on disk into
`local/`. The flow PLAN.md described — fetch the newer archive from peers,
then auto-install — does not exist.
**Fix candidates:**
- New `PeerCommand::FetchLatestFromPeers { id }` that skips the local-serve
gate and asks one peer for its current manifest.
- `PeerCommand::GetGame { id, force_peer: true }` flag honored by
`try_serve_local_game`.
- `try_serve_local_game` short-circuits only when local `eti_version` is ≥
`peer_db.get_latest_version_for_game(id)`. The aggregation function
already exists in `peer_db.rs:320`; nothing calls it for this purpose.
**Tests to add:** `update_game` actually pulls the newer manifest from a
peer when one exists. Today this can't be tested because the code path
doesn't exist.
---
## 2. `OperationGuard::Drop` is doing ordered state transitions
`crates/lanspread-peer/src/handlers.rs:254-279`:
```rust
ctx.task_tracker.spawn(async move {
let result = {
let _download_state_guard = OperationGuard::download(...);
download_game_files(...).await
}; // guard drops here
match result {
Ok(()) => run_install_operation(&ctx_clone, ..., download_id).await,
...
}
});
```
`OperationGuard::Drop` (`context.rs:156-191`) tries `try_write` first, then
falls back to `tokio::spawn(async { ... .write().await.remove(...) })` if the
lock is contended. The contention happens because `active_operations` is read
on every watcher tick, every `list_games`, every `can_serve_game`, every
liveness sweep, every `update_and_announce_games` snapshot.
This is the wrong shape for the state transition. Drop is fire-and-forget;
the synchronous code after the guard scope keeps running before the deferred
removal lands. Two distinct symptoms of the same root cause:
1. **Install rejected:** `run_install_operation` calls `begin_operation`
(`handlers.rs:336-339`) which does `Entry::Vacant` on the same map. If
`begin_operation` wins the lock before the spawned remove task does, it
sees the leftover `Downloading` entry and rejects the install. User sees
`version.ini` on disk, no `local/`, no `InstallGameBegin`, no
explanation.
2. **Stale snapshot:** Post-finish refresh calls
`active_operation_snapshot` (`handlers.rs:558`) before the deferred
removal runs. UI receives one final snapshot saying the operation is
active even though `InstallGameFinished` was already sent.
**Fix:** Explicit `async end_operation(...)` call before finish/refresh,
under a single write lock. The same write that removes `Downloading`
should insert `Installing`/`Updating` for the auto-install path, making
the handoff atomic. Demote `OperationGuard` to crash-safety: only fires
when the task panics or is aborted, and logs loudly when it does.
**Tests to add:**
- Hold a read lock on `active_operations` while `download_game_files`
returns; assert the auto-install still proceeds.
- Liveness path cancellation while multiple downloads are in flight;
assert no duplicate failure events and no stuck operation-table
entries.
---
## 3. Uncoordinated library-index writes
`scan_local_library` (`local_games.rs:533-615`) and `rescan_local_game`
(`local_games.rs:617-639`) both load `library_index.json`, mutate the
deserialized state, and save. Nothing serializes the two paths.
Call sites:
- `run_fallback_scan` (`local_monitor.rs:289`) → `scan_local_library`.
- `run_gated_rescan` (`local_monitor.rs:261`) → `rescan_local_game`,
spawned on the task tracker (line 253-258).
- `load_local_library` (`handlers.rs:491`) → `scan_local_library`.
- `refresh_local_game` (`handlers.rs:520`) → `rescan_local_game`.
A fallback-scan tick can land between a gated-rescan's load and save (or
vice versa). Last writer wins; intermediate updates are silently dropped.
The piece of state that drifts in a user-visible way is `revision`: both
writers compute `old.saturating_add(1)` and save `old+1`, while the
in-memory `LocalLibraryState.revision` bumps independently in
`update_from_scan`. After a restart, disk-revision can be lower than
peers expect, breaking `LibraryDelta.from_rev` matching — peers will
fall back to snapshots and the delta optimization is undone.
**Fix candidates:** `tokio::Mutex` around index I/O, or move the index
ownership into the same actor that owns `LocalLibraryState` so all
mutations go through one channel.
---
## What's *not* in this file
Everything else found during review is in `BACKLOG.md`. Notable items
include: Tauri-side parallel scanning, legacy peer protocol fallback,
unreachable `Availability::Downloading` variant, stale FOLLOW_UP_2.md.
None of those block merging.
---
## Definition of done for this branch
- Fixes for #1, #2, #3 land.
- Tests listed under each fix land.
- `just test`, `just clippy`, `just build` clean.
- Manual: install a game, then update it while a peer advertises a newer
version, then uninstall it. Verify the version actually changes after
update (covers #1) and that the UI doesn't get stuck on a spinner
after operations complete (covers #2).
Once those are green, this branch is done. Re-reviewing will surface
more smells; don't run another review unless something behaves wrong
when tested manually.
Manual install/update/uninstall smoke testing is still a useful release check,
but there are no known blocking findings left in this file.