crazy
This commit is contained in:
+161
@@ -0,0 +1,161 @@
|
||||
# Findings — Bugs to Fix Before Merging
|
||||
|
||||
Three bugs found in the post-PLAN.md implementation. Fix these, then merge.
|
||||
Everything else lives in `BACKLOG.md` and does not block.
|
||||
|
||||
---
|
||||
|
||||
## 1. `update_game` never fetches a fresh manifest from peers
|
||||
|
||||
PLAN.md:357 calls for `update_game` to send `GetGame`, *fetch fresh remote
|
||||
archives*, and trigger an auto-install as a transactional update. The
|
||||
implementation makes that path unreachable.
|
||||
|
||||
Trace:
|
||||
|
||||
1. Tauri `update_game` sends `PeerCommand::GetGame`
|
||||
(`crates/lanspread-tauri-deno-ts/src-tauri/src/lib.rs:162`).
|
||||
2. Peer's `handle_get_game_command` calls `try_serve_local_game` first
|
||||
(`crates/lanspread-peer/src/handlers.rs:88`).
|
||||
3. `try_serve_local_game` consults `local_download_available`, which returns
|
||||
true whenever `version.ini` is present locally and the ID is in the catalog
|
||||
(`crates/lanspread-peer/src/local_games.rs:48-66`). For any game the user
|
||||
has already downloaded, this is *always* true.
|
||||
4. The **local** file descriptions are returned via `GotGameFiles`. Tauri
|
||||
routes those into `DownloadGameFiles`.
|
||||
5. `handle_download_game_files_command:204-227` consults `peer_game_db` via
|
||||
`validate_file_sizes_majority`, so cached remote metadata *is* read. But
|
||||
the descriptions actually used for chunk planning are the local ones
|
||||
(`handlers.rs:189-195`). When peers advertise a newer version with
|
||||
different file sizes, the whitelist is empty and the path falls into
|
||||
"instant install of local archives." When sizes happen to match, we
|
||||
plan chunks against local descriptions and request those offsets from
|
||||
peers — which works only when peer-side files are identical to local.
|
||||
Either way, peers' current manifests are never read.
|
||||
|
||||
Net effect: "update" = re-extract whatever archives are on disk into
|
||||
`local/`. The flow PLAN.md described — fetch the newer archive from peers,
|
||||
then auto-install — does not exist.
|
||||
|
||||
**Fix candidates:**
|
||||
|
||||
- New `PeerCommand::FetchLatestFromPeers { id }` that skips the local-serve
|
||||
gate and asks one peer for its current manifest.
|
||||
- `PeerCommand::GetGame { id, force_peer: true }` flag honored by
|
||||
`try_serve_local_game`.
|
||||
- `try_serve_local_game` short-circuits only when local `eti_version` is ≥
|
||||
`peer_db.get_latest_version_for_game(id)`. The aggregation function
|
||||
already exists in `peer_db.rs:320`; nothing calls it for this purpose.
|
||||
|
||||
**Tests to add:** `update_game` actually pulls the newer manifest from a
|
||||
peer when one exists. Today this can't be tested because the code path
|
||||
doesn't exist.
|
||||
|
||||
---
|
||||
|
||||
## 2. `OperationGuard::Drop` is doing ordered state transitions
|
||||
|
||||
`crates/lanspread-peer/src/handlers.rs:254-279`:
|
||||
|
||||
```rust
|
||||
ctx.task_tracker.spawn(async move {
|
||||
let result = {
|
||||
let _download_state_guard = OperationGuard::download(...);
|
||||
download_game_files(...).await
|
||||
}; // guard drops here
|
||||
match result {
|
||||
Ok(()) => run_install_operation(&ctx_clone, ..., download_id).await,
|
||||
...
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
`OperationGuard::Drop` (`context.rs:156-191`) tries `try_write` first, then
|
||||
falls back to `tokio::spawn(async { ... .write().await.remove(...) })` if the
|
||||
lock is contended. The contention happens because `active_operations` is read
|
||||
on every watcher tick, every `list_games`, every `can_serve_game`, every
|
||||
liveness sweep, every `update_and_announce_games` snapshot.
|
||||
|
||||
This is the wrong shape for the state transition. Drop is fire-and-forget;
|
||||
the synchronous code after the guard scope keeps running before the deferred
|
||||
removal lands. Two distinct symptoms of the same root cause:
|
||||
|
||||
1. **Install rejected:** `run_install_operation` calls `begin_operation`
|
||||
(`handlers.rs:336-339`) which does `Entry::Vacant` on the same map. If
|
||||
`begin_operation` wins the lock before the spawned remove task does, it
|
||||
sees the leftover `Downloading` entry and rejects the install. User sees
|
||||
`version.ini` on disk, no `local/`, no `InstallGameBegin`, no
|
||||
explanation.
|
||||
2. **Stale snapshot:** Post-finish refresh calls
|
||||
`active_operation_snapshot` (`handlers.rs:558`) before the deferred
|
||||
removal runs. UI receives one final snapshot saying the operation is
|
||||
active even though `InstallGameFinished` was already sent.
|
||||
|
||||
**Fix:** Explicit `async end_operation(...)` call before finish/refresh,
|
||||
under a single write lock. The same write that removes `Downloading`
|
||||
should insert `Installing`/`Updating` for the auto-install path, making
|
||||
the handoff atomic. Demote `OperationGuard` to crash-safety: only fires
|
||||
when the task panics or is aborted, and logs loudly when it does.
|
||||
|
||||
**Tests to add:**
|
||||
|
||||
- Hold a read lock on `active_operations` while `download_game_files`
|
||||
returns; assert the auto-install still proceeds.
|
||||
- Liveness path cancellation while multiple downloads are in flight;
|
||||
assert no duplicate failure events and no stuck operation-table
|
||||
entries.
|
||||
|
||||
---
|
||||
|
||||
## 3. Uncoordinated library-index writes
|
||||
|
||||
`scan_local_library` (`local_games.rs:533-615`) and `rescan_local_game`
|
||||
(`local_games.rs:617-639`) both load `library_index.json`, mutate the
|
||||
deserialized state, and save. Nothing serializes the two paths.
|
||||
|
||||
Call sites:
|
||||
|
||||
- `run_fallback_scan` (`local_monitor.rs:289`) → `scan_local_library`.
|
||||
- `run_gated_rescan` (`local_monitor.rs:261`) → `rescan_local_game`,
|
||||
spawned on the task tracker (line 253-258).
|
||||
- `load_local_library` (`handlers.rs:491`) → `scan_local_library`.
|
||||
- `refresh_local_game` (`handlers.rs:520`) → `rescan_local_game`.
|
||||
|
||||
A fallback-scan tick can land between a gated-rescan's load and save (or
|
||||
vice versa). Last writer wins; intermediate updates are silently dropped.
|
||||
|
||||
The piece of state that drifts in a user-visible way is `revision`: both
|
||||
writers compute `old.saturating_add(1)` and save `old+1`, while the
|
||||
in-memory `LocalLibraryState.revision` bumps independently in
|
||||
`update_from_scan`. After a restart, disk-revision can be lower than
|
||||
peers expect, breaking `LibraryDelta.from_rev` matching — peers will
|
||||
fall back to snapshots and the delta optimization is undone.
|
||||
|
||||
**Fix candidates:** `tokio::Mutex` around index I/O, or move the index
|
||||
ownership into the same actor that owns `LocalLibraryState` so all
|
||||
mutations go through one channel.
|
||||
|
||||
---
|
||||
|
||||
## What's *not* in this file
|
||||
|
||||
Everything else found during review is in `BACKLOG.md`. Notable items
|
||||
include: Tauri-side parallel scanning, legacy peer protocol fallback,
|
||||
unreachable `Availability::Downloading` variant, stale FOLLOW_UP_2.md.
|
||||
None of those block merging.
|
||||
|
||||
---
|
||||
|
||||
## Definition of done for this branch
|
||||
|
||||
- Fixes for #1, #2, #3 land.
|
||||
- Tests listed under each fix land.
|
||||
- `just test`, `just clippy`, `just build` clean.
|
||||
- Manual: install a game, then update it while a peer advertises a newer
|
||||
version, then uninstall it. Verify the version actually changes after
|
||||
update (covers #1) and that the UI doesn't get stuck on a spinner
|
||||
after operations complete (covers #2).
|
||||
|
||||
Once those are green, this branch is done. Re-reviewing will surface
|
||||
more smells; don't run another review unless something behaves wrong
|
||||
when tested manually.
|
||||
Reference in New Issue
Block a user