test(peer-cli): harden S1-S47 scenario suite against vacuous and flaky checks

An adversarial audit of the headless peer-to-peer scenario suite
(crates/lanspread-peer-cli/scripts/run_extended_scenarios.py, driven by
`just peer-cli-tests`) found assertions that passed even when the behavior
they claim to test was not happening, plus timing races and doc-vs-code
divergences. A full baseline run (S1-S47) passed beforehand, confirming these
were test-quality gaps, not peer regressions; the baseline output itself
exposed the worst offenders -- e.g. S14 chunk totals {128 MiB, 1 MiB} (a
two-chunk file whose "balanced within one chunk" check can never fail) and
S16/S18 serving the whole ~120 MiB alienswarm.eti from a single source, so
fanout and retry were never exercised.

Test-correctness fixes (a broken behavior could previously pass green):
- S18: the "no download-failed" check was dead -- it reused a LineWaiter
  already advanced past download-finished, so it scanned an empty tail.
  Replaced with assert_no_event_since over the whole window. Switched to a
  4*CHUNK_SIZE sparse archive so both peers get chunks; the test now proves the
  download SURVIVES a mid-download source kill (every byte delivered, survivor
  served part, no download-failed, clean diff). Retry-onto-survivor is the
  mechanism but is not asserted: the kill/serve race against `docker rm -f`
  cannot be forced, so asserting an exact split would be flaky.
- S7: the only check was a diff against byte-identical ggoo fixtures, so it was
  source-agnostic. Added assertions that the download committed exactly once,
  every chunk came from the validated two-peer set, both peers served, and no
  chunk was fetched twice.
- S14: enlarged to 4*CHUNK_SIZE so the balance check can fail under a 3+1
  imbalance; asserts an exact 2+2 split summing to the file size.
- S16: inflated the .eti to 2*CHUNK_SIZE so it fans out across both
  catalog-version peers (the stock 120 MiB fixture is a single chunk).
- S37: validate the throughput rate fields (positive, self-consistent
  mbit/mib == 8.388608, mib_per_s == bytes/duration), not just the byte count.
- S35: assert the source actually advertises the unknown game before checking
  it is filtered, so "absent" means "filtered" and not "never sent".
- S15: cross-check each peer's raw advertised eti_version via list-peers; the
  list-games eti_game_version is synthesized from the catalog and can only ever
  equal the asserted value.
- S2: poll for library convergence and verify the bidirectional exchange
  (bravo sees alpha's 3 games, not just alpha seeing bravo's 4).
- S12/S28: require the gating unit test to appear as "<name> ... ok" so an
  #[ignore]d (un-run) test no longer satisfies the check.
- S24/S25: assert the requested install=false final state.
- S34: assert exactly 21 coherent chunks (20 files + version.ini), 21 distinct
  paths, no duplicates, instead of a >= 21 floor.

Flake fixes:
- S19: force-kill the sole source right after download-begin on a 4*CHUNK_SIZE
  file and accept download-failed or download-peers-gone. The old graceful
  shutdown on a single-chunk file could let the transfer finish first, turning
  the expected failure into a download-finished. A chunk may complete before
  the kill lands, but the full transfer cannot, so the failure is deterministic.
- S26: use a large sparse source so the first operation is reliably still
  active when the duplicate request is issued (TOCTOU on active_operations);
  also assert the active operation == "Downloading".
- S11: drop the "listener address must change" assertion -- it tested the OS
  ephemeral-port allocator and could fail spuriously; keep the same-identity /
  no-duplicate invariant.

Coverage and determinism:
- S27: add handshake::tests::inbound_hello_from_self_is_ignored for the
  protocol-level self guard. The CLI scenario only exercises the CLI
  string-compare guard, which short-circuits before any network call, so the
  peer-crate guard had no test.
- find_fixture_game now iterates sorted(FIXTURES) so the ambiguous cnctw
  (fixture-bravo/multi/solid) resolves deterministically to fixture-bravo.

Reviewed and deliberately left as-is (documented in the run log): S20, S21,
S30, S32/S39/S44 absence checks, S42 IP-order precondition, S45.

PEER_CLI_SCENARIOS.md rows S2, S11, S14, S16, S18, S19, S27 are updated to
match the harness, and a dated run-log entry records the audit, the fixes, the
accepted items, and the live-run evidence.

Test Plan:
- `just peer-cli-tests` (rebuilds the image, runs S1-S47 in Docker): baseline
  passed; post-fix passed; a final run on the exact committed code passed
  47/47. Evidence: S14 {268435456, 268435456} balanced 2+2; S16 .eti split
  across B and C {134217728, 134217728}; S18 all 536870912 bytes delivered with
  no download-failed; S19 deterministic download-failed; S37 ~874 MiB/s.
- `just test` (incl. inbound_hello_from_self_is_ignored), `just clippy`
  (-D warnings, all-targets), and `just fmt` all pass.

Refs: PEER_CLI_SCENARIOS.md scenario matrix and 2026-06-21 run-log entry.
This commit was merged in pull request #29.
This commit is contained in:
2026-06-21 10:23:45 +02:00
parent 72fa61da65
commit 2d53848e0c
3 changed files with 388 additions and 71 deletions
+73 -6
View File
@@ -18,15 +18,15 @@ for deterministic local runs; mDNS/macvlan remains an environment smoke path.
| S8 | Ambiguous metadata rejection | Two peers advertise the same game/version with conflicting file sizes. | Download fails with a `download-failed` event; no committed `version.ini` is left for the target game. |
| S9 | Missing game | Client asks for a game none of its peers can serve. | CLI reports a deterministic command failure and emits `no-peers-have-game`; no local files are created. |
| S10 | Shutdown and goodbye cleanup | Alpha and bravo are connected, then bravo shuts down. | Alpha receives peer loss/removal and remote games from bravo disappear. |
| S11 | Same identity reconnect | Bravo restarts with the same state dir but a new port, then alpha connects to the new address. | Alpha has one bravo peer entry with the updated address, not duplicate identities. |
| S11 | Same identity reconnect | Bravo restarts with the same state dir (the OS assigns an ephemeral listener port that usually, but not necessarily, differs), then alpha connects again. | Alpha has exactly one bravo peer entry reusing the same peer ID, not a duplicate identity, at whatever address bravo now advertises. |
| S12 | Transfer serving gates | A peer has a non-catalog, missing-sentinel, active-operation, or `local/` path request. | The serving peer declines metadata/data; covered by unit tests where timing is too small for a stable CLI race test. |
| S13 | Exact transferred-file equality | Repeat small and large downloads, then compare every transferred regular file against its source with SHA-256 manifests. | Source and receiver manifests match exactly for each transferred file; no extra or missing files appear in the downloaded game root. |
| S14 | Large multi-peer chunked download | `fixture-alpha/alienswarm` contains a renamed RAR `.eti` larger than 100 MB. A second peer downloads it, then a third peer downloads `alienswarm` from both peers. | The third peer's downloaded files match the source by SHA-256; `download-chunk-finished` events show the large `.eti` chunks coming from both peers with byte counts balanced within one chunk. |
| S14 | Large multi-peer chunked download | A source advertises a synthetic catalog game whose `.eti` is a sparse file of `4 * CHUNK_SIZE` (four 128 MiB chunks). A second peer downloads it, then a third peer downloads it from both peers. | The third peer's downloaded files match the source by SHA-256; `download-chunk-finished` shows the `.eti` split across exactly both peers, all four chunks accounted for, and the per-peer byte totals balanced within one `CHUNK_SIZE` (a fair 2+2 split; a 3+1 imbalance would trip the check). |
| S15 | Catalog-version skew | Three peers advertise the same catalog game ID. Peers A and B have stale `version.ini` values; peer C has the catalog's expected version. An empty client connects to all three and downloads the game with `install=false`. | `list-games` shows one row for the game with `peer_count=1` and the catalog `eti_game_version`. The `got-game-files` descriptor set and transfer source are peer C only; no chunks come from A or B. The receiver's `version.ini` and SHA-256 manifest match C exactly. |
| S16 | Catalog-version fanout with stale peers present | Peer A has a stale version of a game. Peers B and C both advertise the catalog version with matching file manifests; use a large file when proving chunk split. | The aggregated row counts only catalog-version ready peers. Large-file chunks may split between B and C; peer A is not listed as downloadable and contributes no manifest vote or file chunks. |
| S16 | Catalog-version fanout with stale peers present | Peer A has a stale version of a game. Peers B and C both advertise the catalog version with matching manifests; the `.eti` is inflated to `2 * CHUNK_SIZE` so it can fan out. | The aggregated row counts only catalog-version ready peers. The `.eti` chunks split across exactly B and C; peer A is not listed as downloadable and contributes no manifest vote or file chunks. |
| S17 | Catalog-version conflict rejection | Peer A has a stale version. Peers B and C both advertise the catalog version, but their file sizes conflict. | Validation considers only the catalog-version peers, so A cannot rescue the majority. The download fails with `download-failed`, and no committed target `version.ini` remains. |
| S18 | Mid-download source drop with redundancy | Client downloads a large shared game from two ready peers, then one source is killed after the download has begun. | Failed chunks are retried against the surviving source; the download finishes, no `download-failed` is emitted, and the receiver's files match the source by diff or SHA-256. |
| S19 | Mid-download sole-source drop | Client downloads a large game from one source, then that source is killed after the download has begun. | The download emits `download-failed`; no committed target `version.ini` remains; any partial payload is not advertised as ready; active operation state clears so a retry is possible. |
| S18 | Mid-download source drop with redundancy | Client downloads a large shared multi-chunk game (sparse `4 * CHUNK_SIZE`, so both peers are assigned `.eti` chunks) from two ready peers, then one source is killed right after the download has begun. | The download survives the source kill: it finishes, no `download-failed` is emitted over the whole download window, every byte is delivered (chunk totals sum to the file size) with the survivor serving part of it, and the receiver's files match the source by diff or SHA-256. (Retry-onto-survivor is the mechanism that makes this possible, exercised when the kill interrupts an unfinished chunk, but it is not asserted because the kill timing cannot be forced; the per-source split is likewise not asserted.) |
| S19 | Mid-download sole-source drop | Client downloads a large multi-chunk game (sparse `4 * CHUNK_SIZE`) from one source, then that source is force-killed immediately after `download-begin`. An individual chunk may complete before the kill lands, but the full multi-chunk download cannot, so the failure is deterministic on a fast LAN. | The download emits a terminal failure (`download-failed`, or `download-peers-gone` when the sole source vanishes) and no `download-finished`; no committed target `version.ini` remains; any partial payload is not advertised as ready; active operation state clears so a retry is possible. |
| S20 | Receiver write failure | Client downloads a large game into a constrained `/games` filesystem. | The download fails deterministically, no committed `version.ini` is advertised, and active operation state clears so the peer can retry later. |
| S21 | Add-game propagation | Two connected peers are running; one peer gains a new catalog game root through a completed download or an external drop. | The other peer receives a library update without reconnecting, and `list-games` shows the new remote game under the existing peer. |
| S22 | Remove-game propagation | Two connected peers are running; one peer loses a previously advertised game root. | The other peer receives a library update without dropping the peer, and `list-games` no longer shows that remote game. |
@@ -34,7 +34,7 @@ for deterministic local runs; mDNS/macvlan remains an environment smoke path.
| S24 | Two clients pull from one source | Two empty clients connect to the same source and download the same large game concurrently. | Both downloads finish, both receivers match the source by diff or SHA-256, and the source remains responsive. |
| S25 | One client downloads two games concurrently | One client connected to a source issues two different `download` commands without waiting for the first to finish. | Both operations may run in parallel; both eventually finish, each game reaches the requested install state, and each transferred root matches its source. |
| S26 | Same-game duplicate download rejection | A client starts downloading a game, then issues a second `download` command for the same game while the first operation is active. | The second request is rejected deterministically as an operation-in-progress condition; the first download is not corrupted and still reaches its documented final state. |
| S27 | Self-connect rejection | A peer sends `connect` to its own advertised listener address. | The command fails cleanly, no self-peer entry is created, and the peer remains responsive. |
| S27 | Self-connect rejection | A peer sends `connect` to its own advertised listener address. | The CLI command fails cleanly (CLI-level guard), no self-peer entry is created, and the peer remains responsive. The protocol-level guard (a hello whose `peer_id` equals the local id is acknowledged but never recorded) is covered by the `handshake::tests::inbound_hello_from_self_is_ignored` unit test, which the CLI string-compare never reaches. |
| S28 | Address change without identity change | A known peer is rediscovered with the same peer ID and a different listener address while its library is still known. | The peer record updates in place to the new address, the existing library stays attached to that peer ID, and no duplicate peer entry appears. This is covered with a deterministic unit-level check until the CLI can rebind a live listener without restart. |
| S29 | Empty-library peer participates | A peer with no games connects into the mesh. | Other peers list it as a peer with zero games; it can receive a download, advertise the new game without restart, and become a source. |
| S30 | 5+ peer mesh aggregation | Five peers advertise partially overlapping catalog games with a mix of unique and shared catalog-version games; a sixth client connects to all five. | The client shows one row per game ID, correct catalog-version ready-source `peer_count`, catalog `eti_game_version`, no duplicates, and no self entries. |
@@ -143,6 +143,73 @@ Use S39-S41 to pin down low-disk streamed installs:
## Run Log
### 2026-06-21 - Test-Suite Integrity Audit And Hardening
- An adversarial review of `run_extended_scenarios.py` found assertions that
passed vacuously, raced, or diverged from the spec. A full baseline run
(S1-S47, rebuilt image) passed beforehand, confirming these were test-quality
gaps, not peer regressions. Baseline evidence of the gaps: S14 chunk totals
were `{134217728, 1048576}` (a 2-chunk file whose "balanced within one chunk"
check can never fail), and S16/S18 each served the whole ~120 MiB
`alienswarm.eti` from a single source, so neither fanout (S16) nor
retry-onto-survivor (S18) was actually exercised.
- Fixes applied to the runner (and the matching rows above):
- S18: replaced the dead `assert_no_event` (it reused a `LineWaiter` already
advanced past `download-finished`, so it scanned an empty tail and could
never fire) with `assert_no_event_since` over the whole download window;
switched to a multi-chunk sparse archive (`4 * CHUNK_SIZE`) so both peers
own `.eti` chunks and the test proves the download survives a mid-download
source kill (retry-onto-survivor is the mechanism, exercised when the kill
interrupts an unfinished chunk, but not asserted since the race can't be
forced).
- S7: added chunk-source, both-sources-served, single-`download-finished`,
and no-duplicate-chunk checks (the byte-identical `ggoo` fixtures made the
old diff-only assertion source-agnostic).
- S14: `4 * CHUNK_SIZE` file so the balance check is meaningful (a 3+1 split
would now exceed one chunk); asserts an exact 2+2 split and full byte total.
- S16: inflated `.eti` to `2 * CHUNK_SIZE` so it fans out across both
catalog-version peers (the stock 120 MiB fixture is a single chunk).
- S19: force-kill right after `download-begin` on a multi-chunk file, accept
`download-failed`/`download-peers-gone`, assert no `download-finished` (the
old graceful shutdown could let a single-chunk transfer finish first).
- S26: large sparse source so the first op is reliably still active, and
asserts the active `operation == "Downloading"` (no scenario checked it).
- S37: validates the throughput rate fields (positive, self-consistent
`mbit_per_s/mib_per_s == 8.388608`, `mib_per_s == bytes/duration`), not just
the byte count.
- S35: asserts the source actually advertises `mystery-game` before checking
it is filtered (distinguishes "filtered" from "never sent").
- S15: cross-checks each peer's raw advertised `eti_version` via list-peers
(the list-games `eti_game_version` is synthesized from the local catalog and
can only ever equal the catalog value).
- S2: polls for library convergence and verifies the bidirectional exchange
(bravo sees alpha's 3 games, not just alpha seeing bravo's 4).
- S11: dropped the "listener address must change" assertion (it tested the OS
ephemeral-port allocator and could fail spuriously).
- S12/S28: require the gating unit test to appear as `<name> ... ok` so an
`#[ignore]`d (un-run) test no longer satisfies the check.
- S24/S25: assert the requested `install=false` final state.
- S34: assert exactly 21 coherent chunks (20 files + version.ini), 21 distinct
paths, no duplicates, instead of a `>= 21` floor.
- S27: added the `handshake::tests::inbound_hello_from_self_is_ignored` unit
test for the protocol-level self guard; the CLI scenario only exercises the
CLI string-compare guard, which short-circuits before any network call.
- Harness: `find_fixture_game` now iterates `sorted(...)`, so the ambiguous
`cnctw` (bravo/multi/solid) resolves deterministically to `fixture-bravo`.
- Accepted as-is (reviewed, deliberately not changed): S20 (disk-full via chunk
`write_all` is equivalent coverage), S21 (inotify across the bind mount is
inherent to the harness), S30 (dup-row/self-peer checks are cheap defensive
guards), S32/S39/S44 absence checks (cheap regression guards against committing
a root sentinel), S42 IP-order precondition (deterministic by container start
order), S45 (the spec already names both terminal events).
- Live runs against the rebuilt `lanspread-peer-cli:dev` image: baseline S1-S47
passed; post-fix S1-S47 passed. Post-fix evidence: S14 `{268435456, 268435456}`
(balanced 2+2); S16 `.eti` split across B and C `{134217728, 134217728}`; S18
all `536870912` bytes delivered despite the source drop (the survivor served
the whole archive in that run); S19 deterministic `download-failed`; S37
`874.24 MiB/s`. Gates: `just test` (incl. the new handshake test),
`just clippy` (`-D warnings`), and `just fmt` all passed.
### 2026-06-20 - Prune Dead Lifecycle Events
- Code under test removed the unconsumed `InstallGameBegin`, `UninstallGameBegin`,