Files
softlan-vpn/README.md
T
ddidderr 2c490b2693 fix(client): clear TAP before resolving relay
The previous startup ordering loaded the virtual MAC before touching TAP, but it
also resolved the relay endpoint before clearing stale TAP media state. That
left a post-crash TAP adapter able to influence DNS or route selection before
the client had pinned the relay path.

Split Windows client startup config into a local phase and a resolved runtime
phase. The local phase reads the certificate, room, TAP adapter selection, and
client identity without performing DNS. Windows startup now writes the TAP
NetworkAddress value and marks TAP media disconnected before resolving the relay
endpoint or opening the QUIC connection.

A regression test uses an intentionally unresolved relay hostname to prove that
building the startup config does not resolve DNS. The client still resolves the
relay before activation and still validates the driver-reported TAP MAC before
bridging.

Test Plan:
- cargo test -p lanparty-client-win
- cargo test --workspace
- cargo clippy --workspace --all-targets -- -D warnings
- cargo fmt --check
- git diff --check
- cargo check -p lanparty-client-win --target x86_64-pc-windows-gnu
  - blocked by missing x86_64-w64-mingw32-gcc for ring on this host

Refs: PLAN.md Windows routing / metric handling
2026-05-22 05:47:27 +02:00

275 lines
14 KiB
Markdown

# softlan-vpn
Monorepo for a Layer 2 over QUIC LAN party bridge.
## Workspace crates
- `lanparty-proto`: shared frame format, MAC validation, MTU helpers.
- `lanparty-ctrl`: control-plane messages (join/hello/role/version).
- `lanparty-net`: shared relay endpoint parsing and resolution.
- `lanparty-obs`: shared diagnostics/logging event models.
- `lanparty-client-core`: platform-agnostic client session state.
- `lanparty-client-route`: Windows relay-route inspection.
- `lanparty-client-tap`: TAP-Windows6 adapter discovery and frame I/O.
- `lanparty-client-win`: Windows TAP + route/metric handling binary.
- `lanparty-gateway`: Linux AF_PACKET gateway binary.
- `lanparty-relay`: public QUIC relay binary.
### `lanparty-proto`
Transport-agnostic tunnel contract shared by all binaries:
- overlay datagram header encoding and decoding
- v1 overlay datagrams reject reserved nonzero flags until their semantics are
defined
- negotiated QUIC datagram budget validation before send
- Ethernet frame header parsing
- MAC address parsing and identity validation
- QUIC datagram to TAP MTU budget helpers
### `lanparty-ctrl`
Reliable control-plane schema shared by the QUIC stream handlers:
- endpoint hello messages with role, room, MAC, and datagram budget
- server welcome mode, reject, peer lifecycle, stats, and disconnect messages
- initial room gateway-presence status in server welcomes
- room-code, role/MAC, peer-id, and effective-MTU validation
- length-prefixed JSON control frames for reliable QUIC streams
### `lanparty-obs`
Shared diagnostics and structured logging vocabulary:
- gateway/relay frame logs with MACs, ethertype, length, peer, and action
- tunnel counters shared by control messages and runtime diagnostics
- client connectivity/TAP diagnostics and user-facing status messages
### `lanparty-net`
Shared network address handling for tunnel binaries:
- relay DNS name, IP literal, and socket-address parsing
- UDP/443 default for bare relay hosts
- relay address resolution before tunnel interface activation
### `lanparty-client-core`
Platform-neutral remote client relay session:
- relay QUIC connection with pinned relay certificate trust
- client hello with room, virtual MAC, and datagram budget
- welcome/reject handling with assigned peer id and effective TAP MTU
- QUIC DATAGRAM support and negotiated datagram budget diagnostics
- relay RTT diagnostics from the active QUIC connection
- reliable relay control-event reads for peer lifecycle messages
- Ethernet frame send/receive helpers over QUIC DATAGRAM with budget, source
MAC, and remote-to-LAN safety checks plus local drop outcomes
- client tunnel statistics for frame/datagram rx/tx and drops
- reliable client stats snapshot sends for relay diagnostics
- best-effort graceful disconnect messages before QUIC close
### `lanparty-client-route`
Windows route-table boundary:
- read-only best-route lookup for a relay destination IP
- selected source address, next hop, interface index/LUID, prefix, and metric
- interface index/LUID lookup from Windows network adapter GUIDs
- scoped IP interface MTU overrides with restore-on-drop behavior
- scoped IP interface metric overrides with restore-on-drop behavior
- scoped default-route suppression with restore-on-drop behavior
- unicast IP address snapshots for TAP diagnostics
- scoped host-route pinning for the relay IP on the pre-TAP interface
- reuse of an already-existing matching relay host route without deleting it on exit
- non-Windows builds return a clear unsupported-platform error
### `lanparty-client-tap`
Windows TAP adapter boundary:
- TAP-Windows6 adapter discovery from the Windows network adapter registry
- TAP `NetworkAddress` registry configuration for the tunnel MAC identity
- `\\.\Global\{NetCfgInstanceId}.tap` device path construction
- blocking Ethernet frame reads/writes through the TAP device handle
- TAP driver IOCTL helpers for media status, adapter MAC, and MTU
### `lanparty-relay`
Public relay binary and relay-owned room state:
- QUIC endpoint binding and first-stream hello/welcome admission
- room admission for clients and gateways
- one gateway per room, duplicate client MAC rejection, and room limits
- stable effective room MTU chosen before Ethernet datagrams flow
- live Ethernet datagram forwarding with no ingress reflection
- per-peer egress budget checks against the negotiated datagram size
- reliable `PeerJoined`/`PeerLeft` notifications to existing room peers
- L2 safety filters for invalid-source, jumbo, switch-control, remote VLAN
tags, remote IPv6 fragments, IPv4/IPv6 DHCP-server, and IPv6-RA frames,
including frames behind ordinary IPv6 extension headers
- client broadcast/multicast, unknown-unicast, and total bandwidth limiting
- malformed peer datagram disconnect threshold
- peer stats control events retained for relay diagnostics
- graceful disconnect control events propagated as peer-leave reasons
- per-peer last-seen timestamps in relay room snapshots
- peer leave cleanup for room membership and MAC indexes
## Build
```bash
cargo check --workspace
```
For the manual MVP end-to-end proof, see [TESTING.md](TESTING.md).
## Relay
```bash
cargo run -p lanparty-relay -- --listen 443/udp --dev-cert-der-out relay-cert.der
```
`--listen` accepts either a socket address or a UDP port shorthand such as
`443/udp`. The relay binds a QUIC endpoint, accepts a control-stream `hello`,
replies with `welcome` or `reject`, and forwards live Ethernet QUIC datagrams
between accepted peers in the same room. It currently uses a generated
self-signed development certificate; `--dev-cert-der-out` writes that
certificate so the gateway and client can pin it in development. Production
certificate handling remains future work. Ethernet forwarding decisions are
logged with room, peer, MAC, ethertype, action, drop reason, and target count.
Safety-policy rejects use the `filtered` action so they are distinguishable
from malformed/unknown-destination drops and rate limits.
Malformed peer datagrams log their per-peer count before the relay disconnects
peers that cross the malformed-datagram threshold.
Relay egress skips caused by a target peer's smaller datagram budget are logged
with the ingress peer, target peer, encoded length, and target budget.
Unknown unicast from a client is forwarded only to the gateway port; unknown
unicast from the gateway is dropped instead of flooded to every remote client.
When a peer joins or leaves, the relay sends a reliable lifecycle control event
to peers that are still present in the room. Newly joined peers also receive
`PeerJoined` events for peers that were already present.
### MVP Trust Model
The MVP relay terminates QUIC for every client and gateway connection. QUIC
protects traffic on the public network path, but the relay process sees
plaintext Ethernet frames while forwarding them between peers in a room. That is
acceptable for the first LAN-party proof, where the relay is an operator-trusted
component, but it is not end-to-end encrypted.
Future room-key payload encryption should keep the relay-visible routing header
small and leave only Ethernet payload bytes encrypted end-to-end between clients
and the LAN gateway.
## Gateway
```bash
cargo run -p lanparty-gateway -- \
--relay lanparty-relay.local \
--server-name lanparty-relay.local \
--relay-ca-cert relay-cert.der \
--room ROOM1 \
--iface eth0
```
The gateway first opens the wired LAN interface as an AF_PACKET socket with
promiscuous packet membership, then connects to the relay as `role = gateway`
and completes the control-stream hello/welcome handshake. That startup order
keeps an invalid, wireless, or unplugged interface from briefly advertising a
gateway that cannot bridge. Once both sides are ready, it bridges Ethernet
frames between the relay and wired LAN until shutdown. It captures whole LAN
frames up to the
overlay payload-length ceiling before deciding whether they fit the tunnel. It
never fragments Ethernet frames; LAN frames with invalid source MACs, L2
control-plane traffic, jumbo frames, or encoded datagrams exceeding the
negotiated QUIC budget are counted, dropped, and logged locally instead of
stopping the bridge or consuming relay bandwidth. Remote frames received from
the relay are safety-checked again before LAN injection and must use the
announced virtual MAC for their source peer, so invalid-source, forged-source,
L2 control-plane, remote VLAN, DHCP-server, IPv6 Router Advertisement, IPv6
fragment, and jumbo frames cannot cross the gateway's final physical-LAN
boundary even if they reached the gateway over QUIC.
`--relay` accepts a DNS name or socket address; bare hosts default to UDP/443.
The gateway rejects Linux interfaces that sysfs identifies as Wi-Fi, and rejects
wired interfaces whose sysfs carrier state reports no link; managed wireless
NICs are not supported for the physical LAN bridge.
It tracks remote-client MACs from relay lifecycle events and periodically emits
small CAM refresh frames so the physical switch keeps those MACs associated
with the gateway port. Gateway
frame logs include direction, peer id when present, MACs, ethertype/length,
frame length, action, and drop reason. The gateway also tracks frame/datagram
counters and periodically sends stats snapshots to the relay. Malformed or runt
LAN frames are counted and logged as dropped instead of disappearing before
accounting. Relay lifecycle events seed and retire remote-client MACs for CAM
refresh even before that client sends traffic. On shutdown, the gateway sends a
best-effort disconnect control message before closing QUIC so the relay can
report the intended reason.
## Windows Client
```bash
cargo run -p lanparty-client-win -- \
--relay lanparty-relay.local \
--server-name lanparty-relay.local \
--relay-ca-cert relay-cert.der \
--room ROOM1
```
The Windows client binary currently connects to the relay as `role = client`
with a generated locally administered virtual MAC persisted in
`lanparty-client-identity.json`. Before resolving or connecting to the relay,
it writes the generated tunnel MAC to the selected TAP driver's
`NetworkAddress` registry setting and marks TAP media disconnected. That clears
stale connected state from a previous crashed run without letting the TAP
adapter influence relay DNS or route selection. The client then resolves the
relay endpoint, completes the control-stream hello/welcome handshake, pins a
host route for the resolved relay IP on the current pre-TAP interface, verifies
that the relay route still uses that pinned host route after TAP activation,
and bridges Ethernet frames between the relay and the TAP-Windows6 adapter
until shutdown. `--relay` accepts a DNS name or socket address; bare hosts
default to UDP/443.
TAP frames whose source MAC does not match that generated tunnel MAC are
dropped locally before they can consume relay bandwidth; the relay still
enforces the same source-MAC rule.
If the exact relay host route already exists, the client uses it and leaves it
alone on exit. The startup status reports whether the relay already has a LAN
gateway for the room.
`--virtual-mac` can still override the stored identity for manual testing. On
Windows it sets the TAP IP interface MTU to the relay-selected MTU, marks the
TAP media connected for the scoped client run, and reports the driver MAC/MTU
before forwarding frames, along with the TAP interface index/LUID. The client
applies a scoped TAP interface metric and disables TAP default routes while it
runs, periodically rechecks that the relay route remains pinned, then restores
the previous route policy and TAP media status on exit. Startup prints a warning
when TAP default routes were enabled
before the scoped protection was applied. Startup still fails before bridging
if the driver-reported MAC does not match the tunnel identity, because an
already-initialized Windows TAP adapter may need to be disabled/enabled or
reinstalled before it reloads the configured `NetworkAddress`.
If exactly one TAP-Windows6 adapter is installed, the client opens it
automatically. If multiple TAP-Windows6 adapters are installed, startup fails
until `--tap-instance-id` selects the intended adapter by NetCfgInstanceId /
InterfaceGuid. `--list-tap-adapters` prints the TAP adapter ids and exits
without connecting.
It prints and reports client diagnostics snapshots with relay reachability,
LAN-gateway presence, route-pinning, QUIC datagram budget, relay RTT, TAP
status/IP, broadcast frame flow, frame/datagram counters, and drops. The
periodic diagnostics refresh the TAP unicast IP so DHCP results that arrive
after bridging starts become visible in later status lines. Each snapshot also
emits short user-facing lines such as relay/gateway connection status,
relay-route and TAP readiness warnings, DHCP address presence, relay RTT, and
broadcast-flow confirmation when those signals are observed. Malformed frames
read from TAP, invalid or unauthorized source-MAC frames, L2 control-plane
traffic, remote VLAN tags, DHCP server replies, IPv6 Router Advertisements, IPv6
fragments, jumbo frames, and TAP frames whose encoded datagrams exceed the
negotiated QUIC budget are counted and dropped before relay send without
stopping the bridge. Relayed LAN frames are also safety-checked before TAP
writes, so switch-control traffic, invalid-source frames, and jumbo frames stay
out of the Windows adapter even if they reached the client over QUIC.
Misdirected unicast frames not addressed to the client's virtual MAC are also
counted and skipped; TAP device read/write errors still stop the bridge.
Relay lifecycle events are logged as they arrive, including gateway joins and
peer leaves. The client remembers peer identities from join and catch-up events
so later leave logs can identify a disconnected LAN gateway or client MAC when
that peer was known.