6.8 KiB
Findings — Bugs to Fix Before Merging
Three bugs found in the post-PLAN.md implementation. Fix these, then merge.
Everything else lives in BACKLOG.md and does not block.
1. update_game never fetches a fresh manifest from peers
PLAN.md:357 calls for update_game to send GetGame, fetch fresh remote
archives, and trigger an auto-install as a transactional update. The
implementation makes that path unreachable.
Trace:
- Tauri
update_gamesendsPeerCommand::GetGame(crates/lanspread-tauri-deno-ts/src-tauri/src/lib.rs:162). - Peer's
handle_get_game_commandcallstry_serve_local_gamefirst (crates/lanspread-peer/src/handlers.rs:88). try_serve_local_gameconsultslocal_download_available, which returns true wheneverversion.iniis present locally and the ID is in the catalog (crates/lanspread-peer/src/local_games.rs:48-66). For any game the user has already downloaded, this is always true.- The local file descriptions are returned via
GotGameFiles. Tauri routes those intoDownloadGameFiles. handle_download_game_files_command:204-227consultspeer_game_dbviavalidate_file_sizes_majority, so cached remote metadata is read. But the descriptions actually used for chunk planning are the local ones (handlers.rs:189-195). When peers advertise a newer version with different file sizes, the whitelist is empty and the path falls into "instant install of local archives." When sizes happen to match, we plan chunks against local descriptions and request those offsets from peers — which works only when peer-side files are identical to local. Either way, peers' current manifests are never read.
Net effect: "update" = re-extract whatever archives are on disk into
local/. The flow PLAN.md described — fetch the newer archive from peers,
then auto-install — does not exist.
Fix candidates:
- New
PeerCommand::FetchLatestFromPeers { id }that skips the local-serve gate and asks one peer for its current manifest. PeerCommand::GetGame { id, force_peer: true }flag honored bytry_serve_local_game.try_serve_local_gameshort-circuits only when localeti_versionis ≥peer_db.get_latest_version_for_game(id). The aggregation function already exists inpeer_db.rs:320; nothing calls it for this purpose.
Tests to add: update_game actually pulls the newer manifest from a
peer when one exists. Today this can't be tested because the code path
doesn't exist.
2. OperationGuard::Drop is doing ordered state transitions
crates/lanspread-peer/src/handlers.rs:254-279:
ctx.task_tracker.spawn(async move {
let result = {
let _download_state_guard = OperationGuard::download(...);
download_game_files(...).await
}; // guard drops here
match result {
Ok(()) => run_install_operation(&ctx_clone, ..., download_id).await,
...
}
});
OperationGuard::Drop (context.rs:156-191) tries try_write first, then
falls back to tokio::spawn(async { ... .write().await.remove(...) }) if the
lock is contended. The contention happens because active_operations is read
on every watcher tick, every list_games, every can_serve_game, every
liveness sweep, every update_and_announce_games snapshot.
This is the wrong shape for the state transition. Drop is fire-and-forget; the synchronous code after the guard scope keeps running before the deferred removal lands. Two distinct symptoms of the same root cause:
- Install rejected:
run_install_operationcallsbegin_operation(handlers.rs:336-339) which doesEntry::Vacanton the same map. Ifbegin_operationwins the lock before the spawned remove task does, it sees the leftoverDownloadingentry and rejects the install. User seesversion.inion disk, nolocal/, noInstallGameBegin, no explanation. - Stale snapshot: Post-finish refresh calls
active_operation_snapshot(handlers.rs:558) before the deferred removal runs. UI receives one final snapshot saying the operation is active even thoughInstallGameFinishedwas already sent.
Fix: Explicit async end_operation(...) call before finish/refresh,
under a single write lock. The same write that removes Downloading
should insert Installing/Updating for the auto-install path, making
the handoff atomic. Demote OperationGuard to crash-safety: only fires
when the task panics or is aborted, and logs loudly when it does.
Tests to add:
- Hold a read lock on
active_operationswhiledownload_game_filesreturns; assert the auto-install still proceeds. - Liveness path cancellation while multiple downloads are in flight; assert no duplicate failure events and no stuck operation-table entries.
3. Uncoordinated library-index writes
scan_local_library (local_games.rs:533-615) and rescan_local_game
(local_games.rs:617-639) both load library_index.json, mutate the
deserialized state, and save. Nothing serializes the two paths.
Call sites:
run_fallback_scan(local_monitor.rs:289) →scan_local_library.run_gated_rescan(local_monitor.rs:261) →rescan_local_game, spawned on the task tracker (line 253-258).load_local_library(handlers.rs:491) →scan_local_library.refresh_local_game(handlers.rs:520) →rescan_local_game.
A fallback-scan tick can land between a gated-rescan's load and save (or vice versa). Last writer wins; intermediate updates are silently dropped.
The piece of state that drifts in a user-visible way is revision: both
writers compute old.saturating_add(1) and save old+1, while the
in-memory LocalLibraryState.revision bumps independently in
update_from_scan. After a restart, disk-revision can be lower than
peers expect, breaking LibraryDelta.from_rev matching — peers will
fall back to snapshots and the delta optimization is undone.
Fix candidates: tokio::Mutex around index I/O, or move the index
ownership into the same actor that owns LocalLibraryState so all
mutations go through one channel.
What's not in this file
Everything else found during review is in BACKLOG.md. Notable items
include: Tauri-side parallel scanning, legacy peer protocol fallback,
unreachable Availability::Downloading variant, stale FOLLOW_UP_2.md.
None of those block merging.
Definition of done for this branch
- Fixes for #1, #2, #3 land.
- Tests listed under each fix land.
just test,just clippy,just buildclean.- Manual: install a game, then update it while a peer advertises a newer version, then uninstall it. Verify the version actually changes after update (covers #1) and that the UI doesn't get stuck on a spinner after operations complete (covers #2).
Once those are green, this branch is done. Re-reviewing will surface more smells; don't run another review unless something behaves wrong when tested manually.