The follow-up review found a few stale lifecycle edges around local game transactions. Recovery could sweep active roots, post-operation refreshes still re-ran full startup recovery, and the UI kept inferring local-only state from downloaded and installed flags instead of the backend availability. This updates the peer lifecycle so startup recovery skips active operations, install/update/uninstall refresh only the affected game after the operation guard is dropped, and path-changing game-directory updates are rejected while operations are active. It also removes the dead UpdateGame command, drops the unused manifest_hash write field while preserving old JSON reads, renames the internal install-finished event, and carries availability through the DB, peer summaries, Tauri refreshes, and the React model. The included follow-up documents record the review source, implementation decisions, and the remaining FOLLOW_UP_2.md work so later commits can stay small instead of reopening the completed plan items. Test Plan: - git diff --check - just fmt - just clippy - just test Follow-up-Plan: FOLLOW_UP_PLAN.md
12 KiB
Review Step 4 — Bigger Picture
The user asked to evaluate the larger picture, not what's missing or next. So this is about what is there: how the design holds up, what risks live in it, and what habits to either trust or watch.
Architectural integrity
The two-axis state model holds up under pressure
The core insight in PLAN.md — that downloaded and installed are independent, with version.ini and local/ as their respective ground-truth signals — is the right shape. It collapses four messy ad-hoc predicates the old code carried (eti exists, version.ini exists, local exists, ___TO_BE_DELETE___ exists) into two boolean signals, each with a single observation point. The downstream cleanup falls out almost automatically: availability, scanner gating, manifest serving, watcher decisions, intent recovery — they all consult the same two predicates.
This is the kind of design where the test for "did we get the abstraction right" is whether new edge cases require new predicates or just slot into the existing matrix. Looking at the install-recovery matrix in transaction.rs:148-322, every case decomposes to (intent_state, local_present, installing_present, backup_present). That's 4 × 2 × 2 × 2 = 32 combinations, of which the plan named ~11 and the rest fall to the "trust FS, don't mutate" default. No new predicate needed for any of them. The model is doing real work.
"FS as truth, intent log as ownership proof" is the load-bearing idea
The intent log doesn't replace the filesystem state — it gives recovery permission to mutate. The plan called this out explicitly ("intent says Installing → corresponding .local.installing/ is unconditionally Lanspread-owned regardless of marker"). The implementation honors it: sweep_owned_orphan (transaction.rs:382-395) refuses to touch markerless dirs in the None case; the per-intent recovery handlers freely remove them when intent proves ownership.
This separation matters most for the "user puts a .local.installing/ directory in their game folder for some reason" scenario. Plenty of designs would just delete it. This one logs a warning and walks away. That's the right call for software that operates on user-managed game folders.
Single operation table as the universal gate is elegant
Every gating decision in the runtime (manifest serve, file stream serve, list-games filter, library announcement, watcher event drop, scan summary preservation, Tauri command rejection) consults the same active_operations: HashMap<String, OperationKind>. This collapses what could have been six separate concurrency primitives into one map and a Drop guard.
The OperationGuard (context.rs:122-191) is the right pattern — clears tracking on any exit path, including task::abort. The fact that it handles three different drop contexts (sync drop, async drop with current runtime handle, drop with no runtime) shows the author thought through what happens at shutdown. The test operation_guard_clears_tracking_when_task_is_dropped (context.rs:269) covers the worst of these.
The peer/Tauri boundary is much cleaner now
Pre-PLAN, the Tauri shell owned:
- The unpacker invocation
- The whole-folder backup scheme (
___TO_BE_DELETE___{id}) - The "downloaded means installed" coupling
- The "spawn unpack after download finishes" decision
Post-PLAN, it owns:
- The unrar sidecar plumbing (injected via
Unpackertrait) - The catalog source (loaded from bundled
game.db) - The UI event fan-out
That's a meaningful shift. The peer crate is now actually headless. The Tauri crate has no business-logic decisions about install state — it routes by local FS state in three lines (src-tauri/src/lib.rs:108-116) and otherwise just relays events.
This also means the peer crate is more testable on its own, which is reflected in the test count (24 new tests in lanspread-peer/src/install/ and download.rs alone).
Risk surface
Two sources of truth for "what's in progress"
The peer keeps active_operations: HashMap<String, OperationKind>. The Tauri shell keeps active_operations: HashMap<String, UiOperationKind>. These are kept in sync by handling PeerEvent lifecycle messages.
If a PeerEvent::InstallGameFinished is ever dropped (channel full, app foregrounded mid-event, panic in the handler), the UI side stays stuck thinking an install is in progress. There is no reconciliation path — the user has to restart the app.
This is a pre-existing pattern in the codebase but the new install/uninstall flows expand it. Worth keeping an eye on.
Mitigation idea (not for this PR): the peer could include a "current operations" snapshot in PeerEvent::LocalGamesUpdated so the UI can recompute its in-progress set rather than maintaining it from event history.
Recovery re-runs are O(games_dir) on every state-changing command
load_local_library → install::recover_on_startup walks every top-level dir under game_dir, reads .lanspread.json from each, and does FS stats. This is called on:
- Initial startup (correct per plan)
SetGameDir(probably correct — different dir, different state)- After every install/uninstall completes (questionable — we just finished a transaction; nothing else could have changed)
For 100 games this is ~100 fs reads per operation. For 1000 games, noticeable latency on every install completion. The fix is to either restrict post-operation recovery to the affected game ID, or skip it entirely (the transaction already cleared its own state).
This is not yet a problem at current scale (handful of games) but is the kind of thing that becomes one once someone with a real library tries it.
recover_on_startup ignores the operation table
If load_local_library runs while a download is in-flight (say, the user clicks SetGameDir mid-download), recover_download_transients will delete .version.ini.tmp and .version.ini.discarded while the download is still using them. The download then tries to rename a tmp that has been swept.
This is a narrow race, but it's possible. The fix is for recover_on_startup to skip game roots whose ID is in active_operations. Worth a thought.
Watcher reconciliation is event-driven, not file-creation-driven
reconcile_watch_state is called from inside the watcher select-loop after either a tick or an event (local_monitor.rs:51-67). If a new game root is created and no event ever fires for the game_dir root, the new game root won't get its own watch until the next 300s tick. In practice the parent-dir watch will see a CREATE event for the new root, which triggers reconcile, so this is usually fine. But it relies on notify's behavior on the specific platform — and notify doesn't promise this for all backends.
The fallback scan catches it within 300s, so worst case is a slow first scan. Not a correctness issue.
Availability::Downloading is defined and never emitted
build_game_summary only produces Ready or LocalOnly. The Downloading variant exists in the proto enum and is never written to a GameSummary. Two consequences:
- If a future change tries to emit it, downstream code (
game_is_readyinpeer_db.rs:726) will quietly classify it as not-Ready and exclude it from peer counts. That might actually be correct, but the implication is undocumented. - The wire protocol carries a value that's structurally unreachable. If a peer running an older binary sends
Downloading, the new code handles it correctly (not-Ready). Good.
This is intentional simplification (the operation table already gates serving during a download, so the summary doesn't need to advertise it). But it's worth a comment in proto code so the next person doesn't add a third gate.
InstallIntent::manifest_hash is dead
The plan defined manifest_hash: Option<u64> in the intent JSON. The implementation has the field but never populates it. Future-proof, or accidental? If accidental, it should be removed; if intentional, a one-line comment in intent.rs saying "reserved for future TBD use" would help.
Code organization
download.rs is at the limit of what one file should hold
1162 lines including tests, mixing chunk planning, retry orchestration, version-sentinel transaction primitives, in-memory buffers, and the main download loop. A future split into download/transaction.rs, download/chunks.rs, and download/orchestrator.rs would be a kindness. Not blocking.
install/ module is well-sized
Three files, each focused: intent.rs (atomic write), transaction.rs (state machine), unpack.rs (trait def). Public surface in mod.rs. Matches the plan exactly and reads cleanly.
handlers.rs has awkward RequestedInstallOperation semantics
Auto, Install, and Update are three variants but Auto and Install collapse to the same behavior (handlers.rs:355-363). Effectively two distinct flows: "user said install/auto → infer from FS" vs "user said update → always treat as Updating". The third variant is rhetorical — Install and Auto could merge into a single Inferred variant or just a boolean.
Worth simplifying later.
Test culture
The author wrote tests that exercise real filesystem semantics: actual temp directories, real fs::rename, real tokio::fs calls. No mocked filesystem layers. This is the right choice for code whose entire value proposition is correct atomic-rename behavior — but it does mean the tests are sensitive to platform behavior. They all pass on Linux in this run.
The FakeUnpacker is the only meaningful test double. It models the unpacker as "write a payload.txt on success, error on fail." That's enough to drive the transaction logic without depending on the unrar sidecar. Right level of abstraction.
The TempDir reimplementation in five files is the only consistent test-hygiene smell. Easy follow-up.
What I would not change
- The "intent log proves ownership, marker is belt-and-suspenders" decomposition. It's the clearest way to reason about recovery and it generalizes.
- The single operation table as the universal gate.
- The injected
Unpackerboxed-future trait. Avoidsasync_trait, stays object-safe, doesn't leak into the peer crate. Clean. - The in-memory
VersionIniBufferfor the sentinel. The sentinel is small; this avoids any chance of partial-write contamination. - Keeping
game-unpack-finishedas the event name for this PR (decision recorded). Worth renaming later but not worth shipping ahead of correctness.
What I would watch
- The Tauri-side
active_operationsmap. If event-loss becomes a real issue in testing, push toward a reconciliation-from-snapshot pattern. recover_on_startupcost on real libraries. If users with hundreds of games see startup latency, restrict recovery scope.- The recovery race against in-flight downloads. Even though the gate prevents new operations on a game with an active operation, the recovery sweep runs without consulting the operation table.
- Whether
Availability::Downloadingever gets emitted. If it does, peer-side filters need a recheck.
Overall
This is a substantial refactor that took a stack of brittle, coupled concerns in Tauri-land — backup-the-whole-game-folder, downloaded-implies-installed, ad-hoc unpack-after-download, time-debounced scanning — and produced a coherent state machine grounded in two predicates. The choices visible in the diff are mostly the same choices I'd have made. The deviations from PLAN.md are minor and recorded (mostly) in IMPL_DECISIONS.md. The test breadth is the weakest dimension but the test depth is good — the tests that exist are not theater.
The design will scale to more operations (e.g., a future "verify" or "repair" command) and to more game-state predicates (e.g., a future "quarantined" state for failed-verification games) without needing to revisit the core architecture. That's the test of whether the abstraction was right.