Commit Graph

22 Commits

Author SHA1 Message Date
ddidderr 373def6d44 feat(peer): prototype streamed installs
Add a streamed-install prototype that can receive archive-derived install bytes
straight into local/ without first storing the peer-owned root archive payload.
This is intended for low-disk clients that want to install a game but opt out of
becoming a downloadable peer source for that game.

The protocol gains a current-version-only StreamInstall request and framed
StreamInstallFrame responses. The peer core owns the generic transport,
transaction, path validation, size checks, CRC32 verification, and lifecycle
state. The archive-specific work is hidden behind StreamInstallProvider so the
prototype can use unrar while the final implementation can swap in a better
provider without rewriting the peer command path.

The receiver writes into .local.installing and only promotes to local/ after the
full stream verifies. It deliberately does not write the root version.ini or
archive files, so the settled local state is installed=true, downloaded=false,
and availability=LocalOnly. That preserves the existing rule that local/ is not
served to peers and makes streamed receivers non-sources by construction.

The CLI is the only caller for now. It exposes stream-install and provides the
prototype unrar implementation with unrar lt for entry metadata and unrar p for
file bytes. This is simple and good enough to prove non-solid archive streaming,
but it is not the production provider shape for solid archives because per-file
unrar p would repeatedly decompress prefixes. The Tauri app explicitly passes
stream_install_provider: None, so the GUI behavior stays unchanged until a real
product path is designed.

Document the production-readiness work in NEXT_STEPS.md. The main follow-up is
to make the provider abstraction final-ish and replace the per-file CLI unrar
provider with a one-pass archive provider, then wire a deliberate GUI low-disk
mode, retry semantics, and broader failure scenarios.

Test Plan:
- just fmt
- RUSTC_WRAPPER= CARGO_BUILD_RUSTC_WRAPPER= just test
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py \
  S39 S40 --build-image
- RUSTC_WRAPPER= CARGO_BUILD_RUSTC_WRAPPER= just clippy
- git diff --check
- git diff --cached --check

Follow-up: NEXT_STEPS.md
2026-06-07 20:32:05 +02:00
ddidderr 06398fe298 fix(peer): reject transfer paths outside requested game
Inbound file-transfer requests carry both a game ID and a relative path. The
serve gate validated whether the requested game was currently servable, but it
did not require the path itself to be rooted under that same game. A
non-conforming peer could therefore register a guard for one game while asking
to read files from another game root.

Require normalized transfer paths to start with the requested game ID before the
file can be dispatched. This keeps the outbound transfer guard, serve policy,
and filesystem path aligned. Absolute, traversal, local-data, missing-sentinel,
active-operation, and wrong-version paths remain rejected by the existing gates.

Test Plan:
- just test
- just clippy
- git diff --check

Refs: Claude review finding #4
2026-05-30 16:36:59 +02:00
ddidderr 738095235f feat(peer): coordinate outbound transfers with local game mutations
Updating or removing a local game rewrites its on-disk files. Peers that
were mid-download of that game would keep streaming bytes from files that
are being deleted or replaced, handing them a corrupt or stale copy.
There was also no authoritative notion of which game version a peer
should serve or accept, so a peer could serve whatever happened to be on
disk and downloaders could aggregate files from peers running mismatched
versions.

This introduces a reader-writer coordination scheme between outbound file
transfers (readers) and local mutation operations (writers), and gates
both serving and downloading on an authoritative game catalog version.

Reader-writer coordination:
- Track active outbound transfers per game in a shared `OutboundTransfers`
  map of (id, CancellationToken), threaded through `Ctx`/`PeerCtx` and
  registered by a `TransferGuard` in the stream service. The guard is
  registered *before* the serve-eligibility check to close a TOCTOU window
  where a writer could miss an in-flight reader.
- `stream_file_bytes` now honors a cancellation token at every await point
  (file read, network send, stream close) via `tokio::select!`, so a
  transfer aborts promptly instead of hanging on a stalled receiver.
- `begin_operation` marks a game active first, then cancels its outbound
  transfers and waits for the count to reach zero before any
  Updating/RemovingDownload work touches the filesystem.
- Active games are now hidden from library snapshots entirely while an
  operation is in flight, instead of freezing their last announced state,
  so peers stop discovering a game that is being mutated.

Authoritative version catalog:
- Replace the `HashSet<String>` catalog with `GameCatalog`, mapping each
  game id to its expected version (from the bundled game.db / ETI data).
- Serving requires the local `version.ini` to match the catalog version
  (`local_download_matches_catalog`); peer selection, file aggregation,
  and majority size validation all filter on the expected version
  (`peers_with_expected_version`, `aggregated_game_files`, and friends).

User-visible changes:
- The GUI shows confirmation dialogs before Update and Remove, and
  surfaces a sharing-status indicator on game cards and the detail modal.
- A new `OutboundTransferCountChanged` event lets the UI reflect live
  outbound transfer activity.

Test Plan:
- just test
- just frontend-test
- just clippy
2026-05-30 16:36:58 +02:00
ddidderr 9835e77e8d feat: store launcher state outside game dirs
Move launcher-owned metadata from game roots into the configured peer state
area. Peer identity, the local library index, install intent logs, and setup
markers now live under app/CLI state instead of being written beside games.
The Tauri shell passes its app data directory into the peer, and the peer CLI
runs the same path through its explicit --state-dir.

Add a dedicated pre-start migration phase for legacy files. It migrates the
old global library index, per-game install intents, and the old first-start
marker into app state, then deletes legacy files only after the replacement
write succeeds. Normal scan, install, recovery, and transfer paths no longer
read legacy state files.

Rename the old first-start meaning to setup_done and only set it after
launching game_setup.cmd. Start/setup scripts keep the shared argument shape,
while server_start.cmd now uses cmd /k and a visible window so server logs stay
open for inspection.

While validating the Docker scenario matrix, make download terminal events
come from the handler after local state refresh and operation cleanup. This
makes download-finished/download-failed safe points for immediate follow-up CLI
commands. Also update the multi-peer chunking scenario to use a sparse archive
large enough to actually span multiple production chunks.

Test Plan:
- just fmt
- just test
- just frontend-test
- just build
- just clippy
- git diff --check
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py

Refs: local app-state migration discussion
2026-05-21 21:32:28 +02:00
ddidderr d7f7dc737e perf(peer): request larger QUIC UDP socket buffers
Configure the s2n-quic Tokio IO provider on both client and server instead of
using the address-only default provider. The configured provider asks the OS for
4 MiB send and receive buffers on each QUIC UDP socket, which avoids starting
bulk LAN transfers on the tiny default UDP buffer sizes.

I tested a wider version that also raised s2n-quic internal IO queues to 8 MiB,
but that regressed S37 to 710.19 and 736.20 MiB/s in repeat runs. This commit
keeps the narrower socket-buffer request, which measured faster than the prior
flow-control-only tuning while leaving the internal queue defaults intact.

The host used for measurement reports:
- net.core.rmem_max = 16777216
- net.core.wmem_max = 16777216
- net.core.rmem_default = 212992
- net.core.wmem_default = 212992

S37 single-source throughput:
- Step 1: 824.94 MiB/s, 6920.09 Mbit/s, 2.483s
- Step 2 sample A: 848.15 MiB/s, 7114.81 Mbit/s, 2.415s
- Step 2 sample B: 874.06 MiB/s, 7332.12 Mbit/s, 2.343s

Test Plan:
- just fmt
- sysctl net.core.rmem_max net.core.wmem_max net.core.rmem_default \
  net.core.wmem_default
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py S37 --build-image
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py S37

Refs: local LAN download performance investigation on 2026-05-20.
Depends-on: cd8bcbfeedfa (QUIC flow-control and BBR tuning).
2026-05-20 20:26:59 +02:00
ddidderr 5b689ec5f4 perf(peer): tune QUIC flow control for LAN downloads
Raise the s2n-quic connection and stream data windows on both the client and
server, increase the max send buffer, and use BBR with a larger initial
congestion window. The download path was already able to pipeline multiple chunk
streams, but those streams still shared small default connection-level budgets
that limited sustained LAN throughput.

The tuning keeps one current wire protocol and does not add fallback behavior.
It is deliberately centralized in the peer networking module so later transport
changes can use the same limits on both sides of the connection.

S37 single-source throughput:
- Before: 733.22 MiB/s, 6150.72 Mbit/s, 2.793s
- After: 824.94 MiB/s, 6920.09 Mbit/s, 2.483s
- Delta: +91.72 MiB/s, +769.37 Mbit/s, about +12.5%

Test Plan:
- just fmt
- python3 crates/lanspread-peer-cli/scripts/run_extended_scenarios.py S37 --build-image

Refs: local LAN download performance investigation on 2026-05-20.
Depends-on: 14e772c5c71a (peer-cli S37 throughput measurement).
2026-05-20 20:26:59 +02:00
ddidderr 41e9a0efc1 refactor(peer): split local library and operation UI events
Replace the `a9f9845` local-update dedup cache with explicit peer event
semantics. Local scans now emit `LocalLibraryChanged` when the library changes,
while operation mutations emit `ActiveOperationsChanged` from the mutation
path. Tauri keeps joining those facts into the existing `games-list-updated`
payload, so the frontend contract stays stable.

This removes the cache/invalidation coupling between scan emission and
operation state. The remaining forced local snapshot is explicit: accepted game
directory changes can refresh the UI for an equivalent new path without sending
a peer library delta.

Operation guard cleanup and liveness cancellation now publish the same active
operation snapshot as normal command-handler transitions. The peer CLI JSONL
events follow the same split with `local-library-changed` and
`active-operations-changed`.

Test Plan:
- `just fmt`
- `CARGO_BUILD_RUSTC_WRAPPER= just test`
- `CARGO_BUILD_RUSTC_WRAPPER= just clippy`
- `git diff --check`

Refs: CLEAN_CODE_PLAN_1.md
2026-05-18 21:25:20 +02:00
ddidderr be00a7a298 fix(peer): exchange full library snapshots during handshake
Peer A failed to learn Peer B's games. The handshake only carried
library_rev/library_digest metadata, and the post-handshake sync path
compared those revisions against per-peer revision numbers that were
never advanced via this code path, so the games map for the remote
peer stayed empty and the UI never showed them.

The fix is to put the authoritative library data into the handshake
itself. Hello and HelloAck now carry a LibrarySnapshot directly, and
both perform_handshake_with_peer (outbound) and accept_inbound_hello
(inbound) apply that snapshot to the peer DB before emitting the UI
events. The initial peer-game-list event is now driven by the
handshake rather than by a follow-up LibrarySummary/LibrarySnapshot
roundtrip.

Bumps PROTOCOL_VERSION to 4 because the wire layout of Hello/HelloAck
changed. Per CLAUDE.md's protocol policy there is no compatibility
shim; older peers will fail the version check and be ignored.

Cleanups that fall out of the new design:

  - The Hello / HelloAck library_rev and library_digest fields were
    duplicated by the embedded LibrarySnapshot (which carries its own
    library_rev, and whose digest is recomputed on apply). Collapsed
    both messages to just `library: LibrarySnapshot` to remove the
    foot-gun where the two could diverge.
  - Request::LibrarySummary and Request::LibrarySnapshot are now dead
    on the sender side and were removed along with their stream.rs
    handlers and the LibrarySummary struct. LibraryDelta stays — it
    is still sent from handlers.rs when the local library changes.
  - record_remote_library previously called update_peer_library and
    then apply_library_snapshot, which immediately overwrote the
    rev/digest just written. Added update_peer_features and rewired
    the call site so each peer-DB field is written exactly once.
    update_peer_library is retained because discovery.rs still uses
    it for the mDNS TXT-record path, where no snapshot is available.
  - Removed the now-unused LibraryUpdate enum, select_library_update,
    send_local_library_summary, send_local_library_update_if_needed,
    LocalLibraryState::delta_since, build_library_summary,
    send_library_summary, and send_library_snapshot.

Behavior change visible to users: when two peers come up on the LAN
they now see each other's full game lists immediately after the
handshake instead of waiting for a follow-up sync that, in the broken
case, never made the games visible at all.

Test Plan

  - just clippy (clean for the touched crates)
  - just test (workspace: all suites pass, including the two new
    handshake tests: outbound_hello_carries_local_library_snapshot
    and inbound_hello_applies_remote_library_snapshot, the latter
    asserting PeerDiscovered + PeerCountUpdated + ListGames events
    fire with the remote game visible)
  - Manual: start `just peer-cli-alpha` and `just peer-cli-bravo` in
    separate terminals; confirm each peer's game list shows the
    other's library entries after discovery completes, without
    requiring any additional command.

Refs

  - FINDINGS.md: triage note that Claude's review surfaced only
    in-scope cleanups (dead variants, duplicated header fields,
    redundant DB writes, stale test fixture), all addressed here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 19:06:39 +02:00
ddidderr ce51d92df0 refactor(peer): tighten listener-addr handshake invariant
Follow-up hardening for 348a02c, where `listen_addr` was added to Hello and
HelloAck as `Option<SocketAddr>`. Code review surfaced three concrete problems
that the previous commit left open:

1. Cold-start asymmetry. Discovery and the QUIC/mDNS advertiser are spawned
   concurrently. If discovery saw a cached peer advertisement before our own
   advertiser had written `ctx.local_peer_addr`, our outbound Hello carried
   `listen_addr: None`. The receiver's `peer_record_addr` then returned `None`
   and silently dropped the Hello while we still recorded their HelloAck, so
   peer A learned about peer B but B never learned about A until a later
   handshake happened to win the race.

2. Duplicate game-list pipeline. The previous commit added
   `refresh_peer_games`, which post-handshake issued a `ListGames` to fetch
   `peer.games`. The library-sync path (`LibrarySnapshot`) already populates
   the same field. Both could race on first contact and overwrite each other.
   Worse, `refresh_peer_games` was misnamed: a `peer_game_count > 0` guard
   turned it into a fetch-once-then-no-op helper, while
   `handle_library_summary` independently re-triggered a full handshake when
   `previous_count == 0` was observed, producing a redundant ping-pong on
   every first contact.

3. Argument explosion. `perform_handshake_with_peer`, `spawn_library_resync`,
   and `after_peer_library_recorded` had grown to 6-8 individual parameters
   and acquired `#[allow(clippy::too_many_arguments)]` opt-outs. Every caller
   was destructuring the same fields out of `Ctx`/`PeerCtx`.

Changes (all in one commit because they jointly enforce the same invariant:
"a peer is only ever recorded by its listener address, and the local
listener address must exist before we participate in the protocol"):

- `Hello.listen_addr` and `HelloAck.listen_addr` are now `SocketAddr`, not
  `Option<SocketAddr>`. Wire-incompatible, but PROTOCOL_VERSION already moved
  to 3 in 348a02c so no additional version bump is needed.
- `required_listen_addr` reads `ctx.local_peer_addr` and returns an
  `eyre::Result`; `build_hello_from_state` and `build_hello_ack` both call
  it, so an outbound or inbound Hello can no longer be constructed before
  the local QUIC listener is bound. The inbound path maps this into a
  `Response::InternalPeerError` so the remote peer fails cleanly instead of
  seeing a malformed HelloAck.
- `run_peer_discovery` blocks on `wait_for_local_peer_addr` (25 ms poll,
  shutdown-aware) before subscribing to the mDNS browser. This closes the
  cold-start race for outbound handshakes at the source.
- `refresh_peer_games`, `request_game_list_from_peer`, and the
  `previous_count == 0` re-handshake trigger are removed. The post-handshake
  flow now relies solely on `LibrarySummary`/`LibrarySnapshot`/`LibraryDelta`
  for peer-library state; `ListGames` survives only for the
  `request_game_details_*` paths that fetch per-game file descriptions on
  demand.
- New `HandshakeCtx` (with `from_ctx` and `from_peer_ctx` constructors)
  replaces the long argument lists. All `too_many_arguments` allow-attrs in
  `handshake.rs` are gone, and call sites in `handlers.rs`, `discovery.rs`,
  and `stream.rs` collapse to a single clone.
- `handle_library_delta` no longer acquires a read lock on the apply path:
  the `peer_addr` lookup moved into the `else` resync branch where it is
  actually needed.
- `accept_inbound_hello`'s `remote_addr` parameter is renamed to
  `transport_addr`. It is now used only for warn-log formatting, and the
  new name signals that this is the ephemeral QUIC source port, never the
  authoritative listener address that gets recorded.

User-visible effect: on cold start, peers can no longer end up with an
asymmetric view of each other ("A sees B but B never sees A"). First-contact
library sync now does one handshake plus one snapshot/delta exchange instead
of the previous handshake + ListGames + redundant follow-up handshake. The
direct-connect CLI path (`handle_connect_peer_command`) now fails fast with
"local peer listener address is not ready" if invoked before the QUIC server
has bound; this is intentional - the previous behaviour would have sent a
Hello that the receiver had to silently discard.

Test Plan:
- just fmt
- just clippy
- just test (80 peer + 3 cli + 5 tauri tests pass)
- just build
- Manual: bring up `just peer-cli-alpha`/`bravo`/`charlie`, confirm symmetric
  peer discovery and that games show up on every side after one library
  digest cycle, with no duplicated ListGames traffic in trace logs.

Refs: Review feedback on commit 348a02c (listener-address handshake fix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:21:19 +02:00
ddidderr 348a02c35f fix(peer): record listener addresses during handshakes
Peers discovered over mDNS could still attribute later library sync traffic to
temporary QUIC source ports. In a real GUI LAN run this made Host B try to
push its library to Host A's outbound port instead of Host A's advertised
listener, so Host A discovered the peer but never saw its games.

Carry the stable listener address in Hello and HelloAck, and key library sync
messages by peer_id instead of inferring identity from the transport source
address. The handshake path now explicitly refreshes an empty peer library from
the known listener address, matching the reliability of the direct-connect CLI
path without overwriting richer snapshot state when it already arrived.

This changes the current wire protocol, so PROTOCOL_VERSION is bumped to 3 and
all peers must be rebuilt together. The architecture note now documents that
listener addresses come from mDNS or Hello/HelloAck, never from ephemeral QUIC
source ports.

Test Plan:
- just fmt
- just test
- just clippy
- just build
- git diff --check

Refs: Local Linux/Win11 GUI LAN test logs from 2026-05-18.
2026-05-18 17:27:15 +02:00
ddidderr 10a1f57183 fix(peer): preserve advertised addresses for QUIC peers
After renewing the dev certificate, peers could complete handshakes but then
lost each other during liveness checks. Inbound QUIC streams report the client's
ephemeral source port, while the peer database is supposed to track the peer's
advertised listening address. Recording the ephemeral address created unstable
peer entries that could not be pinged later.

Resolve transport source addresses back to the unique known peer on the same IP,
and keep an existing advertised address when an inbound Hello arrives from that
peer. Goodbye events now report the stored peer address as well.

This keeps the core peer behavior in lanspread-peer; the CLI only observes the
resulting peer snapshots.

Test Plan:
- just fmt
- just test
- just clippy
- just peer-cli-build
- just peer-cli-image
- just peer-cli-alpha, just peer-cli-bravo, just peer-cli-charlie
- list-peers after the ping idle window shows advertised peer addresses with
  populated game lists instead of ephemeral-port peers disappearing

Refs: PEER_CLI_SCENARIOS.md
2026-05-17 09:34:10 +02:00
ddidderr 3380d137fc fix: ignore local watcher access events
The peer CLI could flood LocalGamesUpdated events when run from the Docker
harness. The local monitor rescans game roots, and some bind-mounted filesystems
report those read/close operations back as notify access events. Treating those
non-mutating events as real library changes queued another rescan, making the
headless CLI unusable for manual peer-to-peer testing.

Ignore access events before mapping paths to game IDs. Create, modify, remove,
and rename events still flow through the existing per-game rescan gate, while
fallback scans continue to reconcile missed writes.

Test Plan:
- just fmt
- just test
- just clippy

Refs: manual peer-cli P2P testing
2026-05-16 19:50:10 +02:00
ddidderr 754afd5621 refactor(peer): drop --no-mdns toggle, mDNS is always on
The peer runtime previously accepted an `enable_mdns: bool` flag, plumbed
through `PeerStartOptions`, `spawn_peer_runtime`, `run_peer`, `Ctx`, and
`PeerCtx`. The lanspread-peer-cli harness exposed the toggle as
`--no-mdns` so test scenarios could fall back to explicit `connect`
commands when mDNS could not be relied on, in particular when multiple
peers ran inside `--network host` containers and could not advertise
independently.

That host-networking workaround no longer exists: the previous commit
moves harness containers onto a macvlan network, where each peer is a
real LAN device and mDNS just works between them. There is no scenario
left in the codebase where disabling mDNS is desirable. Per the project's
protocol policy in CLAUDE.md ("there is only one wire version, no
compatibility shims, no fallback paths"), an opt-out path with no current
caller is exactly the kind of dead code we should not carry.

Remove the flag and every plumbing point that exists only to support it:

- `PeerStartOptions::enable_mdns` and the custom `Default` impl that set
  it to `true`; the struct now derives `Default` and just carries
  `state_dir`.
- The `enable_mdns` parameter on `start_peer_with_options`,
  `spawn_peer_runtime`, `run_peer`, and `Ctx::new`.
- The `enable_mdns` fields on `Ctx` and `PeerCtx` and the propagation
  through `to_peer_ctx`.
- The `if ctx.enable_mdns` guard in `spawn_startup_services`;
  `spawn_peer_discovery_service` is now always spawned.
- The `if ctx.enable_mdns { ... } else { ... }` branch in
  `run_server_component`: the mDNS advertiser and event monitor are now
  unconditionally started, and the no-mDNS-fallback log line that read
  "mDNS disabled; direct peer address is ..." is gone. The
  `direct_connect_addr` helper is kept because the mDNS-on branch still
  uses it as a fallback when `local_peer_addr` has not yet been
  populated.
- The internal test helpers in `handlers.rs`, `services/local_monitor.rs`,
  and `services/stream.rs` that passed `true` as the trailing
  `enable_mdns` arg to `Ctx::new`.
- In `lanspread-peer-cli`: the `--no-mdns` arg parsing, the
  `Args::enable_mdns` field, the `mdns` key on the `cli-started` event
  payload, and the `--no-mdns` mention in the help text and the crate
  README.

The `Args::name` field is wired to the harness identity but is otherwise
untouched. The macvlan network created by `just peer-cli-net` is the
runtime prerequisite for this change to be observable across containers;
on a single workstation, two harness binaries on `127.0.0.1` discover
each other through mDNS on the loopback interface as before.

Test Plan:
- `just fmt`
- `just clippy`
- `just test`
- `just peer-cli-build`
- Two peers on macvlan: `just peer-cli-run alpha` and
  `just peer-cli-run beta`; check that each emits `peer-discovered` and
  `peer-connected` events without an explicit `connect` JSONL command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:51:54 +02:00
ddidderr e711cf3454 fix(peer): settle current-protocol local state cleanup
The follow-up backlog had drifted into three settled peer/runtime issues: the
legacy game-list fallback contradicted the one-wire-version policy, the Tauri
shell still re-derived local install state from disk after peer snapshots, and
`Availability::Downloading` existed even though active operations are already
reported through a separate operation table.

Remove the legacy `AnnounceGames` request and fallback service. Discovery now
ignores peers that do not advertise the current protocol and a peer id, and
library changes are sent through the current delta path only. This keeps the
runtime aligned with the documented current-build-only interoperability model.

Make peer `LocalGamesUpdated` snapshots authoritative for local fields in the
Tauri database. The GUI-side catalog still owns static metadata such as names,
sizes, and descriptions, but downloaded, installed, local version, and
availability now come from the peer runtime instead of a second whole-library
filesystem scan. Snapshot reconciliation also pins the missing-begin and
missing-finish lifecycle cases in tests.

Collapse availability back to the settled `Ready` and `LocalOnly` states.
Aggregation now counts only `Ready` peers as download sources, and the frontend
no longer carries a dead `Downloading` enum value.

The core peer also exposes the small non-GUI hooks needed by scripted callers:
startup options for state and mDNS, a local-ready event, direct connection, peer
snapshots, and an explicit post-download install policy. Those hooks reuse the
same current protocol path and do not add compatibility shims.

Test Plan:
- `git diff --check`
- `just fmt`
- `just clippy`
- `just test`

Refs: BACKLOG.md, FINDINGS.md, IMPL_DECISIONS.md
2026-05-16 18:32:24 +02:00
ddidderr 6242d64583 fix(peer): repair update lifecycle regressions
FINDINGS.md identified three merge blockers in the post-plan install/update
flow.

Updates now use FetchLatestFromPeers so the Tauri update command bypasses
local manifest serving and asks peers that advertise the latest version for
fresh file metadata. PeerGameDB now aggregates and validates file descriptions
from latest-version peers, keeping stale cached metadata for older versions
from poisoning chunk planning when filenames stay the same but sizes change.

Download-to-install handoff now performs explicit async state transitions.
The download task mutates Downloading to Installing or Updating under the
active-operation write lock, clears the cancellation token, and then runs the
install transaction. OperationGuard remains armed only as crash or abort
cleanup and is disarmed after normal explicit cleanup, so final refreshes no
longer race a deferred Drop cleanup.

Local library index writers now serialize the load/mutate/save window with one
async mutex. The index fingerprint also includes the root version.ini contents
so a same-length version rewrite in the same mtime second still updates the
reported local version.

The tradeoff is that local index mutations are serialized in-process instead
of moved into a dedicated actor. That keeps the fix small and scoped to the
merge blockers while preserving the existing scanner API.

Test Plan:
- just fmt
- just test
- just clippy
- just build
- git diff --check

Refs:
- FINDINGS.md
2026-05-16 14:19:10 +02:00
ddidderr 894eb5af6a test(peer): consolidate temp dir helper
Move the repeated test TempDir implementations into a single peer
test_support module. The shared helper keeps the existing automatic cleanup
behavior and uses an atomic suffix plus timestamp so parallel tests do not
collide on the same path.

This is intentionally limited to test hygiene. It does not change the
availability model, split download.rs, or touch production scan/install
behavior beyond importing the shared helper from test modules.

Test Plan:
- git diff --check
- just fmt
- just clippy
- just test

Follow-up-Plan: FOLLOW_UP_2.md
2026-05-16 09:21:43 +02:00
ddidderr 7731a9daa0 test(peer): cover serve gating dispatch
Add focused serve-side tests for the gates around peer requests. GetGame now has
coverage for the non-catalog, active-operation, and missing-sentinel cases that
should return GameNotFound instead of exposing local files.

The full-file and chunk handlers both depend on the same transfer gate before
touching the QUIC send stream. Extract that gate into a small helper and test
the same cases there, plus the existing local-path exclusion, so both dispatch
paths stay aligned without adding fake QUIC stream plumbing.

Test Plan:
- git diff --check
- just fmt
- just clippy
- just test

Follow-up-Plan: FOLLOW_UP_2.md
2026-05-16 09:16:37 +02:00
ddidderr 2a94445391 test(peer): cover local monitor rescan gating
Add dispatch-level tests for the local game monitor paths called out in
FOLLOW_UP_2.md. The new coverage verifies watcher events are dropped while a
game has an active operation, burst events for one game collapse through the
pending set to at most one extra rescan, fallback scans pick up sideloaded
catalog games, and non-catalog roots stay invisible to the library state.

The non-catalog test exposed that an empty local library initialized with
digest zero, while the computed digest for an empty map is nonzero. That made
the first empty scan produce a meaningless empty LibraryDelta. Initialize the
empty state with the computed empty digest so a non-catalog-only scan leaves no
delta behind.

Test Plan:
- git diff --check
- just fmt
- just clippy
- just test

Follow-up-Plan: FOLLOW_UP_2.md
2026-05-16 09:13:38 +02:00
ddidderr 6c8a2bb9f0 feat(peer): add transactional local game operations
Implement the peer-owned state model from PLAN.md. A root-level version.ini
is now the download completion sentinel, local/ as a directory is the install
predicate, and exact root-level version.ini detection prevents nested files
from becoming sentinels by accident.

Add the peer operation table that gates downloads, installs, updates, and
uninstalls by game ID. Serving paths now reject non-catalog games, active
operations, missing sentinels, and any request that points under local/.
Remote aggregation treats LocalOnly peers as non-downloadable so they do not
contribute peer counts, candidate source selection, or latest-version checks.

Move install-side filesystem mutation into lanspread-peer::install. The new
module writes atomic .lanspread.json intents, uses .local.installing and
.local.backup with .lanspread_owned markers, and performs startup recovery
from recorded intent plus filesystem state. Downloads now buffer version.ini
chunks in memory and commit the sentinel last through .version.ini.tmp.

Replace the fixed 15-second monitor with notify-backed non-recursive watches,
per-ID rescan gating, and a 300-second fallback scan. The optimized rescan
path updates one cached library-index entry and active operation IDs preserve
their previous summary during scans.

Test Plan:
- just fmt
- just clippy
- just test
- just build

Refs: PLAN.md
2026-05-15 18:18:55 +02:00
ddidderr 2bbd2ac869 refactor(peer): adopt structured concurrency with supervised shutdown
Replace the detached tokio::spawn pattern in the peer runtime with a
supervised model built on tokio_util's CancellationToken and TaskTracker.
Long-lived services and child tasks now have an explicit parent, a
cancellation path, and a join point. Tauri can request a clean shutdown
on app exit instead of leaking work into process termination.

Background
~~~~~~~~~~

start_peer() previously returned only a command sender. The four startup
services (QUIC server, mDNS discovery, peer liveness, local library
monitor) and their child tasks (ping workers, handshake jobs, download
workers, announcement fan-outs, connection/stream handlers) were spawned
with raw tokio::spawn and detached. Closing the command channel sent
Goodbye notifications but did not stop those services. The mDNS blocking
worker had no cancellation path at all. Active downloads were stored as
JoinHandle<()> and force-aborted, which could interrupt file writes
mid-chunk.

Supervisor
~~~~~~~~~~

The runtime now owns a CancellationToken and a TaskTracker, threaded
through Ctx and PeerCtx. Each long-lived service is spawned through a
small supervisor (spawn_supervised_service) that wraps the service in
catch_unwind and enforces an explicit SupervisionPolicy:

  QuicServer:    Required     (fatal; cancels the runtime if it dies)
  Discovery:     Restart(5s)  (matches the prior self-restart loop)
  Liveness:      Restart(5s)
  LocalMonitor:  BestEffort   (logs and exits, no restart)

A Required failure emits a new RuntimeFailed { component, error } event
to the UI and cancels the runtime; the command loop and goodbye
notifications still run to completion. The Tauri layer forwards the
event as "peer-runtime-failed" so a future UI can surface it.

mDNS cancellation
~~~~~~~~~~~~~~~~~

MdnsBrowser previously blocked on receiver.recv() forever. It now
exposes next_service_timeout(Duration) returning an MdnsServicePoll
enum (Service/Timeout/Closed) via recv_timeout(). The discovery worker
polls at 250ms and checks the shutdown flag between ticks, so
cancellation reaches the blocking thread within one poll interval
instead of waiting for the next mDNS event.

Downloads
~~~~~~~~~

active_downloads is now HashMap<String, CancellationToken>. Each
download gets a child token of the runtime shutdown, checked at chunk
and peer-attempt boundaries (never inside file writes). When all peers
with a game disappear, liveness cancels the token and emits
DownloadGameFilesAllPeersGone; the download exits Ok(()) without
emitting a duplicate Failed event.

DownloadStateGuard (context.rs) is held inside the download task and
clears downloading_games + active_downloads on Drop, covering the happy
path, error returns, cancellation, and task abort. Drop falls back to
spawning the cleanup if write-lock contention prevents try_write.

Public API and Tauri integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

start_peer() now returns PeerRuntimeHandle exposing:

  fn sender(&self) -> UnboundedSender<PeerCommand>
  fn shutdown(&self)
  async fn wait_stopped(&mut self)

The Tauri layer stores the handle in managed state and switches its
main loop from .run(ctx) to .build(ctx).run(|h, e| ...). On
RunEvent::Exit it calls handle.shutdown() and blocks up to 2s on
wait_stopped(), giving services time to cancel and Goodbye packets time
to flush over a healthy LAN while staying short enough not to delay
process exit noticeably on a dead network.

The command loop distinguishes graceful shutdown from unexpected
channel closure: if recv() returns None and shutdown.is_cancelled() is
set, the loop returns Ok(()) silently. Only an unexpected close (no
cancellation observed) still emits RuntimeFailed. This avoids a
spurious failure event on every normal app close.

User-visible behavior changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Closing the app no longer leaks services into process termination;
  Goodbye notifications are reliably attempted before exit.
- Downloads cancel cleanly (between chunks) instead of force-aborting
  mid-write.
- A new "peer-runtime-failed" Tauri event fires when a Required service
  cannot recover. No frontend handler exists yet — that is a follow-up.

Tradeoffs
~~~~~~~~~

- Workspace tokio-util now requires the "rt" feature for TaskTracker.
- The mDNS worker still runs in spawn_blocking and may stay parked
  briefly between 250ms polls — acceptable for a desktop app.
- The 2s shutdown timeout on app exit is a deliberate compromise.

Tests
~~~~~

New unit tests:
  - DownloadStateGuard clears tracking on completion, cancellation, and
    parent-task abort (context.rs).
  - Required failure cancels the runtime and emits RuntimeFailed
    (startup.rs).
  - Restart policy restarts until shutdown is requested (startup.rs).
  - PeerRuntimeHandle.shutdown() observable via wait_stopped()
    (startup.rs).
  - Peers-gone cancellation emits only PeersGone, no duplicate Failed
    (services/liveness.rs).

Test plan
~~~~~~~~~

  cargo test --workspace
  cargo clippy --workspace --all-targets

Manual smoke test on two peers on the same LAN:
  1. Start a download, verify chunks transfer.
  2. Close the receiving app mid-download — verify the sending peer
     logs a Goodbye, not a connection-reset error.
  3. Stop the sending peer mid-download — verify the receiver emits
     DownloadGameFilesAllPeersGone, not Failed.

Follow-ups
~~~~~~~~~~

- Frontend handler for "peer-runtime-failed".
- Consider exposing the runtime handle's stopped watch to the frontend
  for a reconnecting indicator on Required failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 07:53:51 +02:00
ddidderr 87d00e7df6 refactor(peer): make startup directory-driven
Peer startup used to bootstrap itself by spawning the runtime and immediately
sending a SetGameDir command back through its own control channel. The Tauri
integration then polled shared state until a directory appeared and waited two
seconds before asking peers for games. That made startup ordering implicit and
left a race-prone sleep in the UI bridge.

Install the initial game directory directly into the peer context instead. The
runtime now attempts the initial local-library scan before starting discovery,
then launches the server, discovery, liveness, and local monitor services from
that initialized context. Later directory changes still use SetGameDir, so the
existing UI command surface stays intact.

Use PathBuf and Path references across peer filesystem boundaries so directory
state is represented as a path rather than an optional string. The Tauri layer
now validates a selected game directory before storing it, loads the bundled
catalog on first use, and starts or updates the peer runtime from one helper.
Peer event fan-out is split into named handlers so the Tauri setup closure only
wires state and starts the event loop.

Shutdown goodbye notifications are still best-effort, but they are now awaited
with a short timeout instead of being spawned and forgotten. The tradeoff is a
small bounded wait during peer runtime shutdown in exchange for clearer task
ownership.

Test Plan:
- cargo test -p lanspread-peer
- cargo clippy
- cargo clippy --benches
- cargo clippy --tests
- cargo +nightly fmt
- git diff --check

Refs: none
2026-05-02 17:09:00 +02:00
ddidderr b4585b663a ChatGPT Codex 5.5 xhigh refactored even more 2026-05-02 15:31:37 +02:00