Files
lanspread/FINDINGS.md
T
2026-05-16 13:15:34 +02:00

162 lines
6.8 KiB
Markdown

# Findings — Bugs to Fix Before Merging
Three bugs found in the post-PLAN.md implementation. Fix these, then merge.
Everything else lives in `BACKLOG.md` and does not block.
---
## 1. `update_game` never fetches a fresh manifest from peers
PLAN.md:357 calls for `update_game` to send `GetGame`, *fetch fresh remote
archives*, and trigger an auto-install as a transactional update. The
implementation makes that path unreachable.
Trace:
1. Tauri `update_game` sends `PeerCommand::GetGame`
(`crates/lanspread-tauri-deno-ts/src-tauri/src/lib.rs:162`).
2. Peer's `handle_get_game_command` calls `try_serve_local_game` first
(`crates/lanspread-peer/src/handlers.rs:88`).
3. `try_serve_local_game` consults `local_download_available`, which returns
true whenever `version.ini` is present locally and the ID is in the catalog
(`crates/lanspread-peer/src/local_games.rs:48-66`). For any game the user
has already downloaded, this is *always* true.
4. The **local** file descriptions are returned via `GotGameFiles`. Tauri
routes those into `DownloadGameFiles`.
5. `handle_download_game_files_command:204-227` consults `peer_game_db` via
`validate_file_sizes_majority`, so cached remote metadata *is* read. But
the descriptions actually used for chunk planning are the local ones
(`handlers.rs:189-195`). When peers advertise a newer version with
different file sizes, the whitelist is empty and the path falls into
"instant install of local archives." When sizes happen to match, we
plan chunks against local descriptions and request those offsets from
peers — which works only when peer-side files are identical to local.
Either way, peers' current manifests are never read.
Net effect: "update" = re-extract whatever archives are on disk into
`local/`. The flow PLAN.md described — fetch the newer archive from peers,
then auto-install — does not exist.
**Fix candidates:**
- New `PeerCommand::FetchLatestFromPeers { id }` that skips the local-serve
gate and asks one peer for its current manifest.
- `PeerCommand::GetGame { id, force_peer: true }` flag honored by
`try_serve_local_game`.
- `try_serve_local_game` short-circuits only when local `eti_version` is ≥
`peer_db.get_latest_version_for_game(id)`. The aggregation function
already exists in `peer_db.rs:320`; nothing calls it for this purpose.
**Tests to add:** `update_game` actually pulls the newer manifest from a
peer when one exists. Today this can't be tested because the code path
doesn't exist.
---
## 2. `OperationGuard::Drop` is doing ordered state transitions
`crates/lanspread-peer/src/handlers.rs:254-279`:
```rust
ctx.task_tracker.spawn(async move {
let result = {
let _download_state_guard = OperationGuard::download(...);
download_game_files(...).await
}; // guard drops here
match result {
Ok(()) => run_install_operation(&ctx_clone, ..., download_id).await,
...
}
});
```
`OperationGuard::Drop` (`context.rs:156-191`) tries `try_write` first, then
falls back to `tokio::spawn(async { ... .write().await.remove(...) })` if the
lock is contended. The contention happens because `active_operations` is read
on every watcher tick, every `list_games`, every `can_serve_game`, every
liveness sweep, every `update_and_announce_games` snapshot.
This is the wrong shape for the state transition. Drop is fire-and-forget;
the synchronous code after the guard scope keeps running before the deferred
removal lands. Two distinct symptoms of the same root cause:
1. **Install rejected:** `run_install_operation` calls `begin_operation`
(`handlers.rs:336-339`) which does `Entry::Vacant` on the same map. If
`begin_operation` wins the lock before the spawned remove task does, it
sees the leftover `Downloading` entry and rejects the install. User sees
`version.ini` on disk, no `local/`, no `InstallGameBegin`, no
explanation.
2. **Stale snapshot:** Post-finish refresh calls
`active_operation_snapshot` (`handlers.rs:558`) before the deferred
removal runs. UI receives one final snapshot saying the operation is
active even though `InstallGameFinished` was already sent.
**Fix:** Explicit `async end_operation(...)` call before finish/refresh,
under a single write lock. The same write that removes `Downloading`
should insert `Installing`/`Updating` for the auto-install path, making
the handoff atomic. Demote `OperationGuard` to crash-safety: only fires
when the task panics or is aborted, and logs loudly when it does.
**Tests to add:**
- Hold a read lock on `active_operations` while `download_game_files`
returns; assert the auto-install still proceeds.
- Liveness path cancellation while multiple downloads are in flight;
assert no duplicate failure events and no stuck operation-table
entries.
---
## 3. Uncoordinated library-index writes
`scan_local_library` (`local_games.rs:533-615`) and `rescan_local_game`
(`local_games.rs:617-639`) both load `library_index.json`, mutate the
deserialized state, and save. Nothing serializes the two paths.
Call sites:
- `run_fallback_scan` (`local_monitor.rs:289`) → `scan_local_library`.
- `run_gated_rescan` (`local_monitor.rs:261`) → `rescan_local_game`,
spawned on the task tracker (line 253-258).
- `load_local_library` (`handlers.rs:491`) → `scan_local_library`.
- `refresh_local_game` (`handlers.rs:520`) → `rescan_local_game`.
A fallback-scan tick can land between a gated-rescan's load and save (or
vice versa). Last writer wins; intermediate updates are silently dropped.
The piece of state that drifts in a user-visible way is `revision`: both
writers compute `old.saturating_add(1)` and save `old+1`, while the
in-memory `LocalLibraryState.revision` bumps independently in
`update_from_scan`. After a restart, disk-revision can be lower than
peers expect, breaking `LibraryDelta.from_rev` matching — peers will
fall back to snapshots and the delta optimization is undone.
**Fix candidates:** `tokio::Mutex` around index I/O, or move the index
ownership into the same actor that owns `LocalLibraryState` so all
mutations go through one channel.
---
## What's *not* in this file
Everything else found during review is in `BACKLOG.md`. Notable items
include: Tauri-side parallel scanning, legacy peer protocol fallback,
unreachable `Availability::Downloading` variant, stale FOLLOW_UP_2.md.
None of those block merging.
---
## Definition of done for this branch
- Fixes for #1, #2, #3 land.
- Tests listed under each fix land.
- `just test`, `just clippy`, `just build` clean.
- Manual: install a game, then update it while a peer advertises a newer
version, then uninstall it. Verify the version actually changes after
update (covers #1) and that the UI doesn't get stuck on a spinner
after operations complete (covers #2).
Once those are green, this branch is done. Re-reviewing will surface
more smells; don't run another review unless something behaves wrong
when tested manually.