Commit Graph

4 Commits

Author SHA1 Message Date
ddidderr 754afd5621 refactor(peer): drop --no-mdns toggle, mDNS is always on
The peer runtime previously accepted an `enable_mdns: bool` flag, plumbed
through `PeerStartOptions`, `spawn_peer_runtime`, `run_peer`, `Ctx`, and
`PeerCtx`. The lanspread-peer-cli harness exposed the toggle as
`--no-mdns` so test scenarios could fall back to explicit `connect`
commands when mDNS could not be relied on, in particular when multiple
peers ran inside `--network host` containers and could not advertise
independently.

That host-networking workaround no longer exists: the previous commit
moves harness containers onto a macvlan network, where each peer is a
real LAN device and mDNS just works between them. There is no scenario
left in the codebase where disabling mDNS is desirable. Per the project's
protocol policy in CLAUDE.md ("there is only one wire version, no
compatibility shims, no fallback paths"), an opt-out path with no current
caller is exactly the kind of dead code we should not carry.

Remove the flag and every plumbing point that exists only to support it:

- `PeerStartOptions::enable_mdns` and the custom `Default` impl that set
  it to `true`; the struct now derives `Default` and just carries
  `state_dir`.
- The `enable_mdns` parameter on `start_peer_with_options`,
  `spawn_peer_runtime`, `run_peer`, and `Ctx::new`.
- The `enable_mdns` fields on `Ctx` and `PeerCtx` and the propagation
  through `to_peer_ctx`.
- The `if ctx.enable_mdns` guard in `spawn_startup_services`;
  `spawn_peer_discovery_service` is now always spawned.
- The `if ctx.enable_mdns { ... } else { ... }` branch in
  `run_server_component`: the mDNS advertiser and event monitor are now
  unconditionally started, and the no-mDNS-fallback log line that read
  "mDNS disabled; direct peer address is ..." is gone. The
  `direct_connect_addr` helper is kept because the mDNS-on branch still
  uses it as a fallback when `local_peer_addr` has not yet been
  populated.
- The internal test helpers in `handlers.rs`, `services/local_monitor.rs`,
  and `services/stream.rs` that passed `true` as the trailing
  `enable_mdns` arg to `Ctx::new`.
- In `lanspread-peer-cli`: the `--no-mdns` arg parsing, the
  `Args::enable_mdns` field, the `mdns` key on the `cli-started` event
  payload, and the `--no-mdns` mention in the help text and the crate
  README.

The `Args::name` field is wired to the harness identity but is otherwise
untouched. The macvlan network created by `just peer-cli-net` is the
runtime prerequisite for this change to be observable across containers;
on a single workstation, two harness binaries on `127.0.0.1` discover
each other through mDNS on the loopback interface as before.

Test Plan:
- `just fmt`
- `just clippy`
- `just test`
- `just peer-cli-build`
- Two peers on macvlan: `just peer-cli-run alpha` and
  `just peer-cli-run beta`; check that each emits `peer-discovered` and
  `peer-connected` events without an explicit `connect` JSONL command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:51:54 +02:00
ddidderr e711cf3454 fix(peer): settle current-protocol local state cleanup
The follow-up backlog had drifted into three settled peer/runtime issues: the
legacy game-list fallback contradicted the one-wire-version policy, the Tauri
shell still re-derived local install state from disk after peer snapshots, and
`Availability::Downloading` existed even though active operations are already
reported through a separate operation table.

Remove the legacy `AnnounceGames` request and fallback service. Discovery now
ignores peers that do not advertise the current protocol and a peer id, and
library changes are sent through the current delta path only. This keeps the
runtime aligned with the documented current-build-only interoperability model.

Make peer `LocalGamesUpdated` snapshots authoritative for local fields in the
Tauri database. The GUI-side catalog still owns static metadata such as names,
sizes, and descriptions, but downloaded, installed, local version, and
availability now come from the peer runtime instead of a second whole-library
filesystem scan. Snapshot reconciliation also pins the missing-begin and
missing-finish lifecycle cases in tests.

Collapse availability back to the settled `Ready` and `LocalOnly` states.
Aggregation now counts only `Ready` peers as download sources, and the frontend
no longer carries a dead `Downloading` enum value.

The core peer also exposes the small non-GUI hooks needed by scripted callers:
startup options for state and mDNS, a local-ready event, direct connection, peer
snapshots, and an explicit post-download install policy. Those hooks reuse the
same current protocol path and do not add compatibility shims.

Test Plan:
- `git diff --check`
- `just fmt`
- `just clippy`
- `just test`

Refs: BACKLOG.md, FINDINGS.md, IMPL_DECISIONS.md
2026-05-16 18:32:24 +02:00
ddidderr 2bbd2ac869 refactor(peer): adopt structured concurrency with supervised shutdown
Replace the detached tokio::spawn pattern in the peer runtime with a
supervised model built on tokio_util's CancellationToken and TaskTracker.
Long-lived services and child tasks now have an explicit parent, a
cancellation path, and a join point. Tauri can request a clean shutdown
on app exit instead of leaking work into process termination.

Background
~~~~~~~~~~

start_peer() previously returned only a command sender. The four startup
services (QUIC server, mDNS discovery, peer liveness, local library
monitor) and their child tasks (ping workers, handshake jobs, download
workers, announcement fan-outs, connection/stream handlers) were spawned
with raw tokio::spawn and detached. Closing the command channel sent
Goodbye notifications but did not stop those services. The mDNS blocking
worker had no cancellation path at all. Active downloads were stored as
JoinHandle<()> and force-aborted, which could interrupt file writes
mid-chunk.

Supervisor
~~~~~~~~~~

The runtime now owns a CancellationToken and a TaskTracker, threaded
through Ctx and PeerCtx. Each long-lived service is spawned through a
small supervisor (spawn_supervised_service) that wraps the service in
catch_unwind and enforces an explicit SupervisionPolicy:

  QuicServer:    Required     (fatal; cancels the runtime if it dies)
  Discovery:     Restart(5s)  (matches the prior self-restart loop)
  Liveness:      Restart(5s)
  LocalMonitor:  BestEffort   (logs and exits, no restart)

A Required failure emits a new RuntimeFailed { component, error } event
to the UI and cancels the runtime; the command loop and goodbye
notifications still run to completion. The Tauri layer forwards the
event as "peer-runtime-failed" so a future UI can surface it.

mDNS cancellation
~~~~~~~~~~~~~~~~~

MdnsBrowser previously blocked on receiver.recv() forever. It now
exposes next_service_timeout(Duration) returning an MdnsServicePoll
enum (Service/Timeout/Closed) via recv_timeout(). The discovery worker
polls at 250ms and checks the shutdown flag between ticks, so
cancellation reaches the blocking thread within one poll interval
instead of waiting for the next mDNS event.

Downloads
~~~~~~~~~

active_downloads is now HashMap<String, CancellationToken>. Each
download gets a child token of the runtime shutdown, checked at chunk
and peer-attempt boundaries (never inside file writes). When all peers
with a game disappear, liveness cancels the token and emits
DownloadGameFilesAllPeersGone; the download exits Ok(()) without
emitting a duplicate Failed event.

DownloadStateGuard (context.rs) is held inside the download task and
clears downloading_games + active_downloads on Drop, covering the happy
path, error returns, cancellation, and task abort. Drop falls back to
spawning the cleanup if write-lock contention prevents try_write.

Public API and Tauri integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

start_peer() now returns PeerRuntimeHandle exposing:

  fn sender(&self) -> UnboundedSender<PeerCommand>
  fn shutdown(&self)
  async fn wait_stopped(&mut self)

The Tauri layer stores the handle in managed state and switches its
main loop from .run(ctx) to .build(ctx).run(|h, e| ...). On
RunEvent::Exit it calls handle.shutdown() and blocks up to 2s on
wait_stopped(), giving services time to cancel and Goodbye packets time
to flush over a healthy LAN while staying short enough not to delay
process exit noticeably on a dead network.

The command loop distinguishes graceful shutdown from unexpected
channel closure: if recv() returns None and shutdown.is_cancelled() is
set, the loop returns Ok(()) silently. Only an unexpected close (no
cancellation observed) still emits RuntimeFailed. This avoids a
spurious failure event on every normal app close.

User-visible behavior changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Closing the app no longer leaks services into process termination;
  Goodbye notifications are reliably attempted before exit.
- Downloads cancel cleanly (between chunks) instead of force-aborting
  mid-write.
- A new "peer-runtime-failed" Tauri event fires when a Required service
  cannot recover. No frontend handler exists yet — that is a follow-up.

Tradeoffs
~~~~~~~~~

- Workspace tokio-util now requires the "rt" feature for TaskTracker.
- The mDNS worker still runs in spawn_blocking and may stay parked
  briefly between 250ms polls — acceptable for a desktop app.
- The 2s shutdown timeout on app exit is a deliberate compromise.

Tests
~~~~~

New unit tests:
  - DownloadStateGuard clears tracking on completion, cancellation, and
    parent-task abort (context.rs).
  - Required failure cancels the runtime and emits RuntimeFailed
    (startup.rs).
  - Restart policy restarts until shutdown is requested (startup.rs).
  - PeerRuntimeHandle.shutdown() observable via wait_stopped()
    (startup.rs).
  - Peers-gone cancellation emits only PeersGone, no duplicate Failed
    (services/liveness.rs).

Test plan
~~~~~~~~~

  cargo test --workspace
  cargo clippy --workspace --all-targets

Manual smoke test on two peers on the same LAN:
  1. Start a download, verify chunks transfer.
  2. Close the receiving app mid-download — verify the sending peer
     logs a Goodbye, not a connection-reset error.
  3. Stop the sending peer mid-download — verify the receiver emits
     DownloadGameFilesAllPeersGone, not Failed.

Follow-ups
~~~~~~~~~~

- Frontend handler for "peer-runtime-failed".
- Consider exposing the runtime handle's stopped watch to the frontend
  for a reconnecting indicator on Required failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 07:53:51 +02:00
ddidderr b4585b663a ChatGPT Codex 5.5 xhigh refactored even more 2026-05-02 15:31:37 +02:00