Files
softlan-vpn/PLAN.md
T
2026-05-21 17:00:58 +02:00

14 KiB
Raw Blame History

What I want to do:

A simple one-click Layer 2 tunnel software (Windows 11 client) to bridge people who cannot participate in person at a LAN party to the LAN party. And a simple server endpoint (Linux) software that runs physically at the LAN party and bridges the tunneled traffic and the real LAN network.

I already talked a bit with different AIs about how to do this, here's the current plan:

LAN Party Tunnel Plan

Build a TAP-based L2-over-QUIC tunnel.

The remote Windows client gets a real virtual Ethernet adapter. Ethernet frames from that adapter are sent over QUIC to a public relay. The relay forwards them to a Linux gateway at the LAN party. The Linux gateway injects those frames onto the physical LAN and captures replies.

Windows game
  ⇄ Windows TAP adapter
  ⇄ lanparty-client.exe
  ⇄ QUIC datagrams
  ⇄ public relay
  ⇄ QUIC datagrams
  ⇄ Linux LAN gateway
  ⇄ physical Ethernet LAN

No WireGuard. No Npcap. No Windows bridge. No packet rewriting from the users real NIC. No tunnel fragmentation for MVP.

Goal

The remote player should do this:

1. Install client.
2. Start it.
3. Enter domain / room code.
4. Click Connect.
5. Game sees a normal LAN adapter.

The physical LAN party host does this:

1. Plug Linux gateway PC into the LAN with wired Ethernet.
2. Run lanparty-gateway --iface eth0 --room ABCD.
3. Done.

The public server does this:

lanparty-relay --listen 443/udp

UDP/443 is a good default, but the port must be configurable because some networks block QUIC/UDP.

Components

1. Windows client: lanparty-client.exe

Written in Rust.

Responsibilities:

- create/open TAP adapter
- give the TAP adapter a unique stable MAC address
- set TAP MTU to a safe small value
- connect to the relay via QUIC
- read Ethernet frames from TAP
- send one Ethernet frame per QUIC datagram
- receive Ethernet frames from QUIC datagrams
- write frames back into TAP
- keep the relay connection routed through the real internet NIC

Use a real TAP/Ethernet adapter. tap-windows6 is an NDIS TAP-Windows driver used by OpenVPN and other apps, which is the right class of device here because we need Ethernet frames, not just IP packets. (GitHub)

Do not use Wintun for this design. Wintun is L3/TUN-style and does not give you the Ethernet/L2 behavior needed for ARP, DHCP, broadcast discovery, and old LAN games.

The TAP adapter is the remote players LAN-party identity.

Game binds to TAP
TAP gets DHCP from real LAN via tunnel
Game sends ARP/broadcast/multicast through TAP
Client tunnels the Ethernet frames

2. Linux gateway: lanparty-gateway

Runs on the physical LAN party machine.

Responsibilities:

- connect outbound to relay
- open raw L2 socket on the wired LAN interface
- capture Ethernet frames from the LAN
- inject remote Ethernet frames onto the LAN
- learn remote MAC addresses
- apply safety filters
- periodically refresh switch CAM table entries

Use Linux AF_PACKET / SOCK_RAW on the real wired NIC. Packet sockets operate at device-driver / OSI Layer 2 level, and SOCK_RAW includes the link-layer header, which is exactly what we need for Ethernet frames. (man7.org)

For MVP, run as root. Later, reduce privileges. Opening raw sockets and changing/promiscuous network behavior needs elevated networking privileges; CAP_NET_ADMIN covers things like setting promiscuous mode, and CAP_NET_RAW covers raw packet access. (man7.org)

No Linux bridge is needed for MVP. No br0. No moving the hosts IP from eth0 to a bridge. The gateway daemon directly captures and injects frames on the physical NIC.

Wired Ethernet only. No Wi-Fi gateway mode for MVP. Managed Wi-Fi NICs are not reliable for arbitrary source-MAC injection.

3. Public relay: lanparty-relay

Runs on VPS/public server.

Responsibilities:

- accept QUIC connections
- group clients and gateway into rooms
- forward Ethernet datagrams
- enforce room limits
- reject duplicate MACs
- rate-limit abuse
- later: auth / invite codes / E2E overlay encryption

For MVP, the relay is the full data path, not merely NAT traversal.

That gives the best UX:

client → outbound QUIC → relay
gateway → outbound QUIC → relay

No port forwarding. No NAT traversal pain. Direct P2P can come later.

Transport

Use QUIC.

Use reliable QUIC streams for control:

hello
join room
role = client | gateway
version negotiation
assigned peer id
announced MAC
MTU negotiation
stats
disconnect reason
future auth

Use QUIC DATAGRAM for Ethernet frames. QUIC DATAGRAM is specifically the unreliable datagram extension for QUIC, which fits Ethernet/game traffic better than reliable streams because old frames should not block newer frames. (IETF Datatracker)

Rust QUIC implementation: start with quinn. It exposes Connection::max_datagram_size(), which returns the maximum datagram payload size or None if datagrams are unsupported/disabled. (Docs.rs)

No fragmentation for MVP

Do not fragment Ethernet frames inside the overlay.

Instead:

small TAP MTU
one TAP Ethernet frame = one QUIC datagram

Startup flow:

1. establish QUIC connection
2. verify QUIC DATAGRAM support
3. query max_datagram_size()
4. compute safe inner MTU
5. configure TAP MTU
6. bring TAP up

MVP default:

TAP MTU: 1200 or 1280-ish
hard fail if QUIC datagram budget is too small

Formula:

tap_mtu <= quic_max_datagram_size
           - overlay_header_len
           - ethernet_header_len
           - safety_margin

No fragment table. No reassembly timeout. No “one lost fragment kills the whole Ethernet frame.” Add fragmentation later only if testing proves it is necessary.

Overlay frame format

Keep the outer routing header small and stable.

Example:

magic:        u32
version:      u8
type:         u8    // frame, control, keepalive
room_id:      u64
peer_id:      u32
flags:        u16
payload_len:  u16
payload:      Ethernet frame bytes

For future relay-blind encryption, split this mentally into:

clear routing header
encrypted Ethernet payload

MVP can skip payload encryption beyond QUIC, but the wire format should not make later E2E encryption painful.

Trust model

MVP relay sees plaintext Ethernet frames.

QUIC encrypts traffic on the wire, but because the relay terminates QUIC connections, it decrypts frames from clients and re-encrypts them to the gateway.

That is acceptable for a LAN-party MVP, but it should be explicitly documented.

Future version:

client/gateway share room key
Ethernet payload is AEAD-encrypted before QUIC
relay only sees room id, peer id, size, timing

Do not retrofit this into a bad packet format later. Reserve the shape now.

Switching model

Treat the whole thing as a tiny user-space Ethernet switch.

Maintain:

MAC -> peer_id
peer_id -> QUIC connection
last_seen timestamp

Forwarding rules:

source MAC from client:
  learn source MAC -> client

known unicast:
  forward only to target peer/gateway

broadcast/multicast:
  flood to gateway and relevant clients

unknown unicast:
  flood initially, later rate-limit

never reflect frame back to ingress peer

For MVP, simplify:

remote client frames mostly go to gateway
LAN frames go to matching remote client or all clients if broadcast/multicast

But MAC learning belongs in the real design.

MAC identity

Each Windows client needs a unique locally administered unicast MAC.

Example range:

02:xx:xx:xx:xx:xx

Generate once per install or per profile. Store it. Configure TAP with it. Announce it during join.

Relay must reject:

- duplicate MAC in same room
- broadcast/multicast source MAC
- obviously invalid MAC
- too many source MACs per client

Default policy:

1 MAC per client
maybe 2 later for weird cases

This is your responsibility, not the users.

Linux gateway CAM-table refresh

The physical LAN switch must learn that remote clients MACs live behind the gateway port.

That happens when the gateway injects frames onto the LAN using the remote clients source MAC.

But switch CAM entries age out. So the gateway should periodically refresh them.

Every ~60 seconds:

for each connected remote MAC:
  inject a tiny harmless Ethernet frame with that MAC as source

The exact frame can be decided during implementation, but the goal is simple: keep the LAN switch mapping the remote MAC to the gateways physical port.

Phase 1 success criterion:

remote client MAC appears in the LAN switch MAC table on the gateway port

If that is false, the L2 illusion is broken.

Safety filters

Remote clients must not be allowed to spray arbitrary L2 control-plane junk onto the real LAN.

Drop remote → LAN unconditionally:

- EAPOL / 802.1X
- STP / BPDUs
- LLDP
- LACP
- DHCP server replies
- IPv6 Router Advertisements
- jumbo frames
- frames from unauthorized source MACs

Also drop LAN → remote:

- EAPOL
- STP
- LLDP
- LACP

No remote Windows client needs to see switch/control-plane traffic.

EAPOL is especially important: remote clients should never be able to interfere with 802.1X or port authentication behavior on the physical switch.

Add rate limits:

- broadcast/multicast per client
- unknown unicast per client
- total bandwidth per client
- malformed packet disconnect threshold

Windows routing / metric handling

The TAP adapter may receive DHCP from the party LAN. That is good.

But if DHCP gives it a default gateway, Windows might try to route the relay connection through the tunnel itself. That would break the tunnel.

Client startup should:

1. resolve relay domain before TAP is active
2. remember current real default gateway/interface
3. add explicit host route to relay IP via real NIC
4. bring TAP up
5. set TAP interface metric appropriately
6. detect and neutralize TAP default-route takeover

The TAP should be preferred for the party LAN subnet, but it must not steal general internet traffic.

Also strongly recommend uncommon LAN party subnets:

good: 10.73.42.0/24
bad:  192.168.0.0/24
bad:  192.168.1.0/24
bad:  192.168.178.0/24

Duplicate subnet with a remote users home LAN will be painful.

Relay placement / latency

Relay-as-data-path is the right MVP. It makes the product work through NAT immediately.

But latency becomes:

client → relay → gateway

So relay location matters.

For Europe/Germany-focused usage, put the relay near the expected players and LAN site, e.g. Frankfurt/Nuremberg/Amsterdam depending on hosting. Later, add direct QUIC path attempts with relay fallback, but do not block MVP on NAT traversal.

Design the room protocol so future modes are possible:

mode = relay
mode = direct-p2p
mode = direct-failed-relay-fallback

Logging / diagnostics

Phase 1 should log heavily.

Gateway frame log:

direction
src MAC
dst MAC
ethertype
length
peer id
action = forwarded | dropped | filtered | rate-limited

Client diagnostics:

relay reachable: yes/no
QUIC datagram support: yes/no
max datagram size
TAP adapter found: yes/no
TAP MAC
TAP MTU
TAP IP from DHCP
relay route pinned: yes/no
frames rx/tx
drops

User-facing diagnostics should eventually say things like:

Connected to relay
Connected to LAN gateway
DHCP received: 10.73.42.51
Gateway latency: 23 ms
Broadcast traffic flowing
Warning: TAP received default route, adjusted metric

Phase plan

Phase 1: prove the illusion

Manual, ugly, real.

- manual TAP install on Windows
- Rust Windows client opens TAP
- fixed TAP MTU, e.g. 1200
- Linux gateway opens AF_PACKET on wired eth0
- relay forwards one client
- no auth except room string
- no fragmentation
- heavy frame logging

Success criteria:

- Windows TAP gets DHCP from real LAN
- Windows client can ARP LAN host
- Windows client can ping LAN host
- remote MAC appears in switch MAC table on gateway port
- one real LAN game discovers or joins a LAN server

Phase 2: multi-client

- multiple Windows clients
- unique MAC generation
- duplicate MAC rejection
- MAC learning
- broadcast/multicast fanout
- CAM refresh frames
- reconnect handling

Phase 3: safety and correctness

- L2 control-plane filters
- DHCP server reply filtering
- IPv6 RA filtering
- MAC limits
- rate limits
- route/metric protection
- better malformed-frame handling

Phase 4: product UX

- Windows installer
- TAP driver install/check
- simple GUI
- room code / domain field
- diagnostics screen
- configurable relay port
- logs export button

Driver signing and TAP bundling must be validated early. tap-windows6 is the right kind of driver, but Windows driver installation/signing is a product risk, not something to handwave. (GitHub)

Phase 5: better security and latency

- invite tokens / auth
- room ACLs
- optional room-key E2E payload encryption
- direct QUIC path attempt
- relay fallback
- regional relay selection

Explicit non-goals

For MVP, do not build:

- Npcap mode
- WinDivert mode
- source-IP rewriting
- Windows bridge
- Hyper-V virtual switch
- WireGuard underlay
- custom Ethernet fragmentation
- Wi-Fi LAN gateway support
- full internet VPN mode

One-sentence version

Build a Rust Windows TAP client + public QUIC relay + Linux AF_PACKET gateway that carries one small-MTU Ethernet frame per QUIC datagram, gives each remote player a unique virtual MAC on the real LAN, filters dangerous L2 control traffic, and keeps the physical LAN gateway as the only machine touching the real LAN.

I want a mono-repo, Rust code, crates into a "crates" folder, one cargo workspace.