Commit Graph

6 Commits

Author SHA1 Message Date
ddidderr 738095235f feat(peer): coordinate outbound transfers with local game mutations
Updating or removing a local game rewrites its on-disk files. Peers that
were mid-download of that game would keep streaming bytes from files that
are being deleted or replaced, handing them a corrupt or stale copy.
There was also no authoritative notion of which game version a peer
should serve or accept, so a peer could serve whatever happened to be on
disk and downloaders could aggregate files from peers running mismatched
versions.

This introduces a reader-writer coordination scheme between outbound file
transfers (readers) and local mutation operations (writers), and gates
both serving and downloading on an authoritative game catalog version.

Reader-writer coordination:
- Track active outbound transfers per game in a shared `OutboundTransfers`
  map of (id, CancellationToken), threaded through `Ctx`/`PeerCtx` and
  registered by a `TransferGuard` in the stream service. The guard is
  registered *before* the serve-eligibility check to close a TOCTOU window
  where a writer could miss an in-flight reader.
- `stream_file_bytes` now honors a cancellation token at every await point
  (file read, network send, stream close) via `tokio::select!`, so a
  transfer aborts promptly instead of hanging on a stalled receiver.
- `begin_operation` marks a game active first, then cancels its outbound
  transfers and waits for the count to reach zero before any
  Updating/RemovingDownload work touches the filesystem.
- Active games are now hidden from library snapshots entirely while an
  operation is in flight, instead of freezing their last announced state,
  so peers stop discovering a game that is being mutated.

Authoritative version catalog:
- Replace the `HashSet<String>` catalog with `GameCatalog`, mapping each
  game id to its expected version (from the bundled game.db / ETI data).
- Serving requires the local `version.ini` to match the catalog version
  (`local_download_matches_catalog`); peer selection, file aggregation,
  and majority size validation all filter on the expected version
  (`peers_with_expected_version`, `aggregated_game_files`, and friends).

User-visible changes:
- The GUI shows confirmation dialogs before Update and Remove, and
  surfaces a sharing-status indicator on game cards and the detail modal.
- A new `OutboundTransferCountChanged` event lets the UI reflect live
  outbound transfer activity.

Test Plan:
- just test
- just frontend-test
- just clippy
2026-05-30 16:36:58 +02:00
ddidderr 41e9a0efc1 refactor(peer): split local library and operation UI events
Replace the `a9f9845` local-update dedup cache with explicit peer event
semantics. Local scans now emit `LocalLibraryChanged` when the library changes,
while operation mutations emit `ActiveOperationsChanged` from the mutation
path. Tauri keeps joining those facts into the existing `games-list-updated`
payload, so the frontend contract stays stable.

This removes the cache/invalidation coupling between scan emission and
operation state. The remaining forced local snapshot is explicit: accepted game
directory changes can refresh the UI for an equivalent new path without sending
a peer library delta.

Operation guard cleanup and liveness cancellation now publish the same active
operation snapshot as normal command-handler transitions. The peer CLI JSONL
events follow the same split with `local-library-changed` and
`active-operations-changed`.

Test Plan:
- `just fmt`
- `CARGO_BUILD_RUSTC_WRAPPER= just test`
- `CARGO_BUILD_RUSTC_WRAPPER= just clippy`
- `git diff --check`

Refs: CLEAN_CODE_PLAN_1.md
2026-05-18 21:25:20 +02:00
ddidderr 6242d64583 fix(peer): repair update lifecycle regressions
FINDINGS.md identified three merge blockers in the post-plan install/update
flow.

Updates now use FetchLatestFromPeers so the Tauri update command bypasses
local manifest serving and asks peers that advertise the latest version for
fresh file metadata. PeerGameDB now aggregates and validates file descriptions
from latest-version peers, keeping stale cached metadata for older versions
from poisoning chunk planning when filenames stay the same but sizes change.

Download-to-install handoff now performs explicit async state transitions.
The download task mutates Downloading to Installing or Updating under the
active-operation write lock, clears the cancellation token, and then runs the
install transaction. OperationGuard remains armed only as crash or abort
cleanup and is disarmed after normal explicit cleanup, so final refreshes no
longer race a deferred Drop cleanup.

Local library index writers now serialize the load/mutate/save window with one
async mutex. The index fingerprint also includes the root version.ini contents
so a same-length version rewrite in the same mtime second still updates the
reported local version.

The tradeoff is that local index mutations are serialized in-process instead
of moved into a dedicated actor. That keeps the fix small and scoped to the
merge blockers while preserving the existing scanner API.

Test Plan:
- just fmt
- just test
- just clippy
- just build
- git diff --check

Refs:
- FINDINGS.md
2026-05-16 14:19:10 +02:00
ddidderr 6c8a2bb9f0 feat(peer): add transactional local game operations
Implement the peer-owned state model from PLAN.md. A root-level version.ini
is now the download completion sentinel, local/ as a directory is the install
predicate, and exact root-level version.ini detection prevents nested files
from becoming sentinels by accident.

Add the peer operation table that gates downloads, installs, updates, and
uninstalls by game ID. Serving paths now reject non-catalog games, active
operations, missing sentinels, and any request that points under local/.
Remote aggregation treats LocalOnly peers as non-downloadable so they do not
contribute peer counts, candidate source selection, or latest-version checks.

Move install-side filesystem mutation into lanspread-peer::install. The new
module writes atomic .lanspread.json intents, uses .local.installing and
.local.backup with .lanspread_owned markers, and performs startup recovery
from recorded intent plus filesystem state. Downloads now buffer version.ini
chunks in memory and commit the sentinel last through .version.ini.tmp.

Replace the fixed 15-second monitor with notify-backed non-recursive watches,
per-ID rescan gating, and a 300-second fallback scan. The optimized rescan
path updates one cached library-index entry and active operation IDs preserve
their previous summary during scans.

Test Plan:
- just fmt
- just clippy
- just test
- just build

Refs: PLAN.md
2026-05-15 18:18:55 +02:00
ddidderr 2bbd2ac869 refactor(peer): adopt structured concurrency with supervised shutdown
Replace the detached tokio::spawn pattern in the peer runtime with a
supervised model built on tokio_util's CancellationToken and TaskTracker.
Long-lived services and child tasks now have an explicit parent, a
cancellation path, and a join point. Tauri can request a clean shutdown
on app exit instead of leaking work into process termination.

Background
~~~~~~~~~~

start_peer() previously returned only a command sender. The four startup
services (QUIC server, mDNS discovery, peer liveness, local library
monitor) and their child tasks (ping workers, handshake jobs, download
workers, announcement fan-outs, connection/stream handlers) were spawned
with raw tokio::spawn and detached. Closing the command channel sent
Goodbye notifications but did not stop those services. The mDNS blocking
worker had no cancellation path at all. Active downloads were stored as
JoinHandle<()> and force-aborted, which could interrupt file writes
mid-chunk.

Supervisor
~~~~~~~~~~

The runtime now owns a CancellationToken and a TaskTracker, threaded
through Ctx and PeerCtx. Each long-lived service is spawned through a
small supervisor (spawn_supervised_service) that wraps the service in
catch_unwind and enforces an explicit SupervisionPolicy:

  QuicServer:    Required     (fatal; cancels the runtime if it dies)
  Discovery:     Restart(5s)  (matches the prior self-restart loop)
  Liveness:      Restart(5s)
  LocalMonitor:  BestEffort   (logs and exits, no restart)

A Required failure emits a new RuntimeFailed { component, error } event
to the UI and cancels the runtime; the command loop and goodbye
notifications still run to completion. The Tauri layer forwards the
event as "peer-runtime-failed" so a future UI can surface it.

mDNS cancellation
~~~~~~~~~~~~~~~~~

MdnsBrowser previously blocked on receiver.recv() forever. It now
exposes next_service_timeout(Duration) returning an MdnsServicePoll
enum (Service/Timeout/Closed) via recv_timeout(). The discovery worker
polls at 250ms and checks the shutdown flag between ticks, so
cancellation reaches the blocking thread within one poll interval
instead of waiting for the next mDNS event.

Downloads
~~~~~~~~~

active_downloads is now HashMap<String, CancellationToken>. Each
download gets a child token of the runtime shutdown, checked at chunk
and peer-attempt boundaries (never inside file writes). When all peers
with a game disappear, liveness cancels the token and emits
DownloadGameFilesAllPeersGone; the download exits Ok(()) without
emitting a duplicate Failed event.

DownloadStateGuard (context.rs) is held inside the download task and
clears downloading_games + active_downloads on Drop, covering the happy
path, error returns, cancellation, and task abort. Drop falls back to
spawning the cleanup if write-lock contention prevents try_write.

Public API and Tauri integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

start_peer() now returns PeerRuntimeHandle exposing:

  fn sender(&self) -> UnboundedSender<PeerCommand>
  fn shutdown(&self)
  async fn wait_stopped(&mut self)

The Tauri layer stores the handle in managed state and switches its
main loop from .run(ctx) to .build(ctx).run(|h, e| ...). On
RunEvent::Exit it calls handle.shutdown() and blocks up to 2s on
wait_stopped(), giving services time to cancel and Goodbye packets time
to flush over a healthy LAN while staying short enough not to delay
process exit noticeably on a dead network.

The command loop distinguishes graceful shutdown from unexpected
channel closure: if recv() returns None and shutdown.is_cancelled() is
set, the loop returns Ok(()) silently. Only an unexpected close (no
cancellation observed) still emits RuntimeFailed. This avoids a
spurious failure event on every normal app close.

User-visible behavior changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Closing the app no longer leaks services into process termination;
  Goodbye notifications are reliably attempted before exit.
- Downloads cancel cleanly (between chunks) instead of force-aborting
  mid-write.
- A new "peer-runtime-failed" Tauri event fires when a Required service
  cannot recover. No frontend handler exists yet — that is a follow-up.

Tradeoffs
~~~~~~~~~

- Workspace tokio-util now requires the "rt" feature for TaskTracker.
- The mDNS worker still runs in spawn_blocking and may stay parked
  briefly between 250ms polls — acceptable for a desktop app.
- The 2s shutdown timeout on app exit is a deliberate compromise.

Tests
~~~~~

New unit tests:
  - DownloadStateGuard clears tracking on completion, cancellation, and
    parent-task abort (context.rs).
  - Required failure cancels the runtime and emits RuntimeFailed
    (startup.rs).
  - Restart policy restarts until shutdown is requested (startup.rs).
  - PeerRuntimeHandle.shutdown() observable via wait_stopped()
    (startup.rs).
  - Peers-gone cancellation emits only PeersGone, no duplicate Failed
    (services/liveness.rs).

Test plan
~~~~~~~~~

  cargo test --workspace
  cargo clippy --workspace --all-targets

Manual smoke test on two peers on the same LAN:
  1. Start a download, verify chunks transfer.
  2. Close the receiving app mid-download — verify the sending peer
     logs a Goodbye, not a connection-reset error.
  3. Stop the sending peer mid-download — verify the receiver emits
     DownloadGameFilesAllPeersGone, not Failed.

Follow-ups
~~~~~~~~~~

- Frontend handler for "peer-runtime-failed".
- Consider exposing the runtime handle's stopped watch to the frontend
  for a reconnecting indicator on Required failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 07:53:51 +02:00
ddidderr b4585b663a ChatGPT Codex 5.5 xhigh refactored even more 2026-05-02 15:31:37 +02:00