Files
lanspread/crates/lanspread-peer/src/services/discovery.rs
T
ddidderr ce51d92df0 refactor(peer): tighten listener-addr handshake invariant
Follow-up hardening for 348a02c, where `listen_addr` was added to Hello and
HelloAck as `Option<SocketAddr>`. Code review surfaced three concrete problems
that the previous commit left open:

1. Cold-start asymmetry. Discovery and the QUIC/mDNS advertiser are spawned
   concurrently. If discovery saw a cached peer advertisement before our own
   advertiser had written `ctx.local_peer_addr`, our outbound Hello carried
   `listen_addr: None`. The receiver's `peer_record_addr` then returned `None`
   and silently dropped the Hello while we still recorded their HelloAck, so
   peer A learned about peer B but B never learned about A until a later
   handshake happened to win the race.

2. Duplicate game-list pipeline. The previous commit added
   `refresh_peer_games`, which post-handshake issued a `ListGames` to fetch
   `peer.games`. The library-sync path (`LibrarySnapshot`) already populates
   the same field. Both could race on first contact and overwrite each other.
   Worse, `refresh_peer_games` was misnamed: a `peer_game_count > 0` guard
   turned it into a fetch-once-then-no-op helper, while
   `handle_library_summary` independently re-triggered a full handshake when
   `previous_count == 0` was observed, producing a redundant ping-pong on
   every first contact.

3. Argument explosion. `perform_handshake_with_peer`, `spawn_library_resync`,
   and `after_peer_library_recorded` had grown to 6-8 individual parameters
   and acquired `#[allow(clippy::too_many_arguments)]` opt-outs. Every caller
   was destructuring the same fields out of `Ctx`/`PeerCtx`.

Changes (all in one commit because they jointly enforce the same invariant:
"a peer is only ever recorded by its listener address, and the local
listener address must exist before we participate in the protocol"):

- `Hello.listen_addr` and `HelloAck.listen_addr` are now `SocketAddr`, not
  `Option<SocketAddr>`. Wire-incompatible, but PROTOCOL_VERSION already moved
  to 3 in 348a02c so no additional version bump is needed.
- `required_listen_addr` reads `ctx.local_peer_addr` and returns an
  `eyre::Result`; `build_hello_from_state` and `build_hello_ack` both call
  it, so an outbound or inbound Hello can no longer be constructed before
  the local QUIC listener is bound. The inbound path maps this into a
  `Response::InternalPeerError` so the remote peer fails cleanly instead of
  seeing a malformed HelloAck.
- `run_peer_discovery` blocks on `wait_for_local_peer_addr` (25 ms poll,
  shutdown-aware) before subscribing to the mDNS browser. This closes the
  cold-start race for outbound handshakes at the source.
- `refresh_peer_games`, `request_game_list_from_peer`, and the
  `previous_count == 0` re-handshake trigger are removed. The post-handshake
  flow now relies solely on `LibrarySummary`/`LibrarySnapshot`/`LibraryDelta`
  for peer-library state; `ListGames` survives only for the
  `request_game_details_*` paths that fetch per-game file descriptions on
  demand.
- New `HandshakeCtx` (with `from_ctx` and `from_peer_ctx` constructors)
  replaces the long argument lists. All `too_many_arguments` allow-attrs in
  `handshake.rs` are gone, and call sites in `handlers.rs`, `discovery.rs`,
  and `stream.rs` collapse to a single clone.
- `handle_library_delta` no longer acquires a read lock on the apply path:
  the `peer_addr` lookup moved into the `else` resync branch where it is
  actually needed.
- `accept_inbound_hello`'s `remote_addr` parameter is renamed to
  `transport_addr`. It is now used only for warn-log formatting, and the
  new name signals that this is the ephemeral QUIC source port, never the
  authoritative listener address that gets recorded.

User-visible effect: on cold start, peers can no longer end up with an
asymmetric view of each other ("A sees B but B never sees A"). First-contact
library sync now does one handshake plus one snapshot/delta exchange instead
of the previous handshake + ListGames + redundant follow-up handshake. The
direct-connect CLI path (`handle_connect_peer_command`) now fails fast with
"local peer listener address is not ready" if invoked before the QUIC server
has bound; this is intentional - the previous behaviour would have sent a
Hello that the receiver had to silently discard.

Test Plan:
- just fmt
- just clippy
- just test (80 peer + 3 cli + 5 tauri tests pass)
- just build
- Manual: bring up `just peer-cli-alpha`/`bravo`/`charlie`, confirm symmetric
  peer discovery and that games show up on every side after one library
  digest cycle, with no duplicated ListGames traffic in trace logs.

Refs: Review feedback on commit 348a02c (listener-address handshake fix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:21:19 +02:00

201 lines
6.2 KiB
Rust

//! mDNS peer discovery and discovery-time protocol negotiation.
use std::time::Duration;
use lanspread_mdns::{LANSPREAD_SERVICE_TYPE, MdnsBrowser, MdnsService, MdnsServicePoll};
use lanspread_proto::PROTOCOL_VERSION;
use tokio::sync::mpsc::UnboundedSender;
use crate::{
PeerEvent,
context::Ctx,
events,
peer_db::PeerId,
services::handshake::{HandshakeCtx, perform_handshake_with_peer},
};
struct MdnsPeerInfo {
addr: std::net::SocketAddr,
peer_id: Option<PeerId>,
proto_ver: Option<u32>,
library_rev: u64,
library_digest: u64,
}
/// Runs the peer discovery service using mDNS.
pub async fn run_peer_discovery(
tx_notify_ui: UnboundedSender<PeerEvent>,
ctx: Ctx,
) -> eyre::Result<()> {
log::info!("Starting peer discovery task");
if !wait_for_local_peer_addr(&ctx).await {
return Ok(());
}
let service_type = LANSPREAD_SERVICE_TYPE.to_string();
let (service_tx, mut service_rx) = tokio::sync::mpsc::unbounded_channel();
let worker_shutdown = ctx.shutdown.clone();
let service_type_clone = service_type.clone();
let worker_handle = ctx
.task_tracker
.spawn_blocking(move || -> eyre::Result<()> {
let browser = MdnsBrowser::new(&service_type_clone)?;
while !worker_shutdown.is_cancelled() {
match browser.next_service_timeout(None, Duration::from_millis(250))? {
MdnsServicePoll::Service(service) => {
if service_tx.send(service).is_err() {
log::debug!("Peer discovery consumer dropped; stopping worker");
break;
}
}
MdnsServicePoll::Timeout => {}
MdnsServicePoll::Closed => {
log::warn!("mDNS browser closed; stopping peer discovery worker");
break;
}
}
}
Ok(())
});
loop {
tokio::select! {
() = ctx.shutdown.cancelled() => break,
service = service_rx.recv() => {
let Some(service) = service else {
break;
};
let info = parse_mdns_peer(&service);
if is_self_advertisement(&info, &ctx).await {
log::trace!("Ignoring self advertisement at {}", info.addr);
continue;
}
handle_discovered_peer(info, &ctx, &tx_notify_ui).await;
}
}
}
match worker_handle.await {
Ok(Ok(())) if ctx.shutdown.is_cancelled() => Ok(()),
Ok(Ok(())) => {
eyre::bail!("mDNS discovery worker exited unexpectedly");
}
Ok(Err(err)) if ctx.shutdown.is_cancelled() => {
log::debug!("Peer discovery worker stopped during shutdown: {err}");
Ok(())
}
Ok(Err(err)) => Err(err.wrap_err("peer discovery worker failed")),
Err(err) if ctx.shutdown.is_cancelled() => {
log::debug!("Peer discovery worker join ended during shutdown: {err}");
Ok(())
}
Err(err) => Err(eyre::eyre!("peer discovery worker join error: {err}")),
}
}
async fn wait_for_local_peer_addr(ctx: &Ctx) -> bool {
loop {
if ctx.local_peer_addr.read().await.is_some() {
return true;
}
tokio::select! {
() = ctx.shutdown.cancelled() => return false,
() = tokio::time::sleep(Duration::from_millis(25)) => {}
}
}
}
fn parse_mdns_peer(service: &MdnsService) -> MdnsPeerInfo {
MdnsPeerInfo {
addr: service.addr,
peer_id: service.properties.get("peer_id").cloned(),
proto_ver: service
.properties
.get("proto_ver")
.and_then(|value| value.parse::<u32>().ok()),
library_rev: service
.properties
.get("library_rev")
.and_then(|value| value.parse::<u64>().ok())
.unwrap_or(0),
library_digest: service
.properties
.get("library_digest")
.and_then(|value| value.parse::<u64>().ok())
.unwrap_or(0),
}
}
async fn is_self_advertisement(info: &MdnsPeerInfo, ctx: &Ctx) -> bool {
let guard = ctx.local_peer_addr.read().await;
guard.as_ref().is_some_and(|addr| *addr == info.addr)
|| info
.peer_id
.as_ref()
.is_some_and(|peer_id| peer_id == ctx.peer_id.as_ref())
}
async fn handle_discovered_peer(
info: MdnsPeerInfo,
ctx: &Ctx,
tx_notify_ui: &UnboundedSender<PeerEvent>,
) {
if info.proto_ver != Some(PROTOCOL_VERSION) {
log::debug!(
"Ignoring peer at {} with protocol {:?}; expected {PROTOCOL_VERSION}",
info.addr,
info.proto_ver
);
return;
}
let Some(peer_id) = info.peer_id.clone() else {
log::debug!(
"Ignoring current-protocol peer at {} without a peer_id TXT record",
info.addr
);
return;
};
let upsert = {
let mut db = ctx.peer_game_db.write().await;
let upsert = db.upsert_peer(peer_id.clone(), info.addr);
let features = db.peer_features(&peer_id);
if info.library_rev > 0 || info.library_digest > 0 {
db.update_peer_library(&peer_id, info.library_rev, info.library_digest, features);
}
upsert
};
if upsert.is_new {
log::info!("Discovered peer at: {}", info.addr);
events::emit_peer_discovered(&ctx.peer_game_db, tx_notify_ui, info.addr).await;
}
if upsert.is_new || upsert.addr_changed {
spawn_protocol_negotiation(&info, ctx, tx_notify_ui, peer_id);
}
}
fn spawn_protocol_negotiation(
info: &MdnsPeerInfo,
ctx: &Ctx,
tx_notify_ui: &UnboundedSender<PeerEvent>,
peer_id: PeerId,
) {
let peer_addr = info.addr;
let handshake_ctx = HandshakeCtx::from_ctx(ctx, tx_notify_ui);
ctx.task_tracker.spawn(async move {
if let Err(err) = perform_handshake_with_peer(handshake_ctx, peer_addr, Some(peer_id)).await
{
log::warn!("Failed to negotiate protocol with peer {peer_addr}: {err}");
}
});
}