Files
lanspread/FINDINGS.md
T
2026-05-16 13:15:34 +02:00

6.8 KiB

Findings — Bugs to Fix Before Merging

Three bugs found in the post-PLAN.md implementation. Fix these, then merge. Everything else lives in BACKLOG.md and does not block.


1. update_game never fetches a fresh manifest from peers

PLAN.md:357 calls for update_game to send GetGame, fetch fresh remote archives, and trigger an auto-install as a transactional update. The implementation makes that path unreachable.

Trace:

  1. Tauri update_game sends PeerCommand::GetGame (crates/lanspread-tauri-deno-ts/src-tauri/src/lib.rs:162).
  2. Peer's handle_get_game_command calls try_serve_local_game first (crates/lanspread-peer/src/handlers.rs:88).
  3. try_serve_local_game consults local_download_available, which returns true whenever version.ini is present locally and the ID is in the catalog (crates/lanspread-peer/src/local_games.rs:48-66). For any game the user has already downloaded, this is always true.
  4. The local file descriptions are returned via GotGameFiles. Tauri routes those into DownloadGameFiles.
  5. handle_download_game_files_command:204-227 consults peer_game_db via validate_file_sizes_majority, so cached remote metadata is read. But the descriptions actually used for chunk planning are the local ones (handlers.rs:189-195). When peers advertise a newer version with different file sizes, the whitelist is empty and the path falls into "instant install of local archives." When sizes happen to match, we plan chunks against local descriptions and request those offsets from peers — which works only when peer-side files are identical to local. Either way, peers' current manifests are never read.

Net effect: "update" = re-extract whatever archives are on disk into local/. The flow PLAN.md described — fetch the newer archive from peers, then auto-install — does not exist.

Fix candidates:

  • New PeerCommand::FetchLatestFromPeers { id } that skips the local-serve gate and asks one peer for its current manifest.
  • PeerCommand::GetGame { id, force_peer: true } flag honored by try_serve_local_game.
  • try_serve_local_game short-circuits only when local eti_version is ≥ peer_db.get_latest_version_for_game(id). The aggregation function already exists in peer_db.rs:320; nothing calls it for this purpose.

Tests to add: update_game actually pulls the newer manifest from a peer when one exists. Today this can't be tested because the code path doesn't exist.


2. OperationGuard::Drop is doing ordered state transitions

crates/lanspread-peer/src/handlers.rs:254-279:

ctx.task_tracker.spawn(async move {
    let result = {
        let _download_state_guard = OperationGuard::download(...);
        download_game_files(...).await
    };  // guard drops here
    match result {
        Ok(()) => run_install_operation(&ctx_clone, ..., download_id).await,
        ...
    }
});

OperationGuard::Drop (context.rs:156-191) tries try_write first, then falls back to tokio::spawn(async { ... .write().await.remove(...) }) if the lock is contended. The contention happens because active_operations is read on every watcher tick, every list_games, every can_serve_game, every liveness sweep, every update_and_announce_games snapshot.

This is the wrong shape for the state transition. Drop is fire-and-forget; the synchronous code after the guard scope keeps running before the deferred removal lands. Two distinct symptoms of the same root cause:

  1. Install rejected: run_install_operation calls begin_operation (handlers.rs:336-339) which does Entry::Vacant on the same map. If begin_operation wins the lock before the spawned remove task does, it sees the leftover Downloading entry and rejects the install. User sees version.ini on disk, no local/, no InstallGameBegin, no explanation.
  2. Stale snapshot: Post-finish refresh calls active_operation_snapshot (handlers.rs:558) before the deferred removal runs. UI receives one final snapshot saying the operation is active even though InstallGameFinished was already sent.

Fix: Explicit async end_operation(...) call before finish/refresh, under a single write lock. The same write that removes Downloading should insert Installing/Updating for the auto-install path, making the handoff atomic. Demote OperationGuard to crash-safety: only fires when the task panics or is aborted, and logs loudly when it does.

Tests to add:

  • Hold a read lock on active_operations while download_game_files returns; assert the auto-install still proceeds.
  • Liveness path cancellation while multiple downloads are in flight; assert no duplicate failure events and no stuck operation-table entries.

3. Uncoordinated library-index writes

scan_local_library (local_games.rs:533-615) and rescan_local_game (local_games.rs:617-639) both load library_index.json, mutate the deserialized state, and save. Nothing serializes the two paths.

Call sites:

  • run_fallback_scan (local_monitor.rs:289) → scan_local_library.
  • run_gated_rescan (local_monitor.rs:261) → rescan_local_game, spawned on the task tracker (line 253-258).
  • load_local_library (handlers.rs:491) → scan_local_library.
  • refresh_local_game (handlers.rs:520) → rescan_local_game.

A fallback-scan tick can land between a gated-rescan's load and save (or vice versa). Last writer wins; intermediate updates are silently dropped.

The piece of state that drifts in a user-visible way is revision: both writers compute old.saturating_add(1) and save old+1, while the in-memory LocalLibraryState.revision bumps independently in update_from_scan. After a restart, disk-revision can be lower than peers expect, breaking LibraryDelta.from_rev matching — peers will fall back to snapshots and the delta optimization is undone.

Fix candidates: tokio::Mutex around index I/O, or move the index ownership into the same actor that owns LocalLibraryState so all mutations go through one channel.


What's not in this file

Everything else found during review is in BACKLOG.md. Notable items include: Tauri-side parallel scanning, legacy peer protocol fallback, unreachable Availability::Downloading variant, stale FOLLOW_UP_2.md. None of those block merging.


Definition of done for this branch

  • Fixes for #1, #2, #3 land.
  • Tests listed under each fix land.
  • just test, just clippy, just build clean.
  • Manual: install a game, then update it while a peer advertises a newer version, then uninstall it. Verify the version actually changes after update (covers #1) and that the UI doesn't get stuck on a spinner after operations complete (covers #2).

Once those are green, this branch is done. Re-reviewing will surface more smells; don't run another review unless something behaves wrong when tested manually.