test(peer-cli): harden S1-S47 scenario suite against vacuous and flaky checks #29
Reference in New Issue
Block a user
Delete Branch "test/peer-cli-scenario-audit"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
An adversarial audit of the headless peer-to-peer scenario suite
(crates/lanspread-peer-cli/scripts/run_extended_scenarios.py, driven by
just peer-cli-tests) found assertions that passed even when the behaviorthey claim to test was not happening, plus timing races and doc-vs-code
divergences. A full baseline run (S1-S47) passed beforehand, confirming these
were test-quality gaps, not peer regressions; the baseline output itself
exposed the worst offenders -- e.g. S14 chunk totals {128 MiB, 1 MiB} (a
two-chunk file whose "balanced within one chunk" check can never fail) and
S16/S18 serving the whole ~120 MiB alienswarm.eti from a single source, so
fanout and retry were never exercised.
Test-correctness fixes (a broken behavior could previously pass green):
already advanced past download-finished, so it scanned an empty tail.
Replaced with assert_no_event_since over the whole window. Switched to a
4*CHUNK_SIZE sparse archive so both peers get chunks; the test now proves the
download SURVIVES a mid-download source kill (every byte delivered, survivor
served part, no download-failed, clean diff). Retry-onto-survivor is the
mechanism but is not asserted: the kill/serve race against
docker rm -fcannot be forced, so asserting an exact split would be flaky.
source-agnostic. Added assertions that the download committed exactly once,
every chunk came from the validated two-peer set, both peers served, and no
chunk was fetched twice.
imbalance; asserts an exact 2+2 split summing to the file size.
catalog-version peers (the stock 120 MiB fixture is a single chunk).
mbit/mib == 8.388608, mib_per_s == bytes/duration), not just the byte count.
it is filtered, so "absent" means "filtered" and not "never sent".
list-games eti_game_version is synthesized from the catalog and can only ever
equal the asserted value.
(bravo sees alpha's 3 games, not just alpha seeing bravo's 4).
#[ignore]d (un-run) test no longer satisfies the check.
paths, no duplicates, instead of a >= 21 floor.
Flake fixes:
file and accept download-failed or download-peers-gone. The old graceful
shutdown on a single-chunk file could let the transfer finish first, turning
the expected failure into a download-finished. A chunk may complete before
the kill lands, but the full transfer cannot, so the failure is deterministic.
active when the duplicate request is issued (TOCTOU on active_operations);
also assert the active operation == "Downloading".
ephemeral-port allocator and could fail spuriously; keep the same-identity /
no-duplicate invariant.
Coverage and determinism:
protocol-level self guard. The CLI scenario only exercises the CLI
string-compare guard, which short-circuits before any network call, so the
peer-crate guard had no test.
(fixture-bravo/multi/solid) resolves deterministically to fixture-bravo.
Reviewed and deliberately left as-is (documented in the run log): S20, S21,
S30, S32/S39/S44 absence checks, S42 IP-order precondition, S45.
PEER_CLI_SCENARIOS.md rows S2, S11, S14, S16, S18, S19, S27 are updated to
match the harness, and a dated run-log entry records the audit, the fixes, the
accepted items, and the live-run evidence.
Test Plan:
just peer-cli-tests(rebuilds the image, runs S1-S47 in Docker): baselinepassed; post-fix passed; a final run on the exact committed code passed
47/47. Evidence: S14 {268435456, 268435456} balanced 2+2; S16 .eti split
across B and C {134217728, 134217728}; S18 all 536870912 bytes delivered with
no download-failed; S19 deterministic download-failed; S37 ~874 MiB/s.
just test(incl. inbound_hello_from_self_is_ignored),just clippy(-D warnings, all-targets), and
just fmtall pass.Refs: PEER_CLI_SCENARIOS.md scenario matrix and 2026-06-21 run-log entry.
An adversarial audit of the headless peer-to-peer scenario suite (crates/lanspread-peer-cli/scripts/run_extended_scenarios.py, driven by `just peer-cli-tests`) found assertions that passed even when the behavior they claim to test was not happening, plus timing races and doc-vs-code divergences. A full baseline run (S1-S47) passed beforehand, confirming these were test-quality gaps, not peer regressions; the baseline output itself exposed the worst offenders -- e.g. S14 chunk totals {128 MiB, 1 MiB} (a two-chunk file whose "balanced within one chunk" check can never fail) and S16/S18 serving the whole ~120 MiB alienswarm.eti from a single source, so fanout and retry were never exercised. Test-correctness fixes (a broken behavior could previously pass green): - S18: the "no download-failed" check was dead -- it reused a LineWaiter already advanced past download-finished, so it scanned an empty tail. Replaced with assert_no_event_since over the whole window. Switched to a 4*CHUNK_SIZE sparse archive so both peers get chunks; the test now proves the download SURVIVES a mid-download source kill (every byte delivered, survivor served part, no download-failed, clean diff). Retry-onto-survivor is the mechanism but is not asserted: the kill/serve race against `docker rm -f` cannot be forced, so asserting an exact split would be flaky. - S7: the only check was a diff against byte-identical ggoo fixtures, so it was source-agnostic. Added assertions that the download committed exactly once, every chunk came from the validated two-peer set, both peers served, and no chunk was fetched twice. - S14: enlarged to 4*CHUNK_SIZE so the balance check can fail under a 3+1 imbalance; asserts an exact 2+2 split summing to the file size. - S16: inflated the .eti to 2*CHUNK_SIZE so it fans out across both catalog-version peers (the stock 120 MiB fixture is a single chunk). - S37: validate the throughput rate fields (positive, self-consistent mbit/mib == 8.388608, mib_per_s == bytes/duration), not just the byte count. - S35: assert the source actually advertises the unknown game before checking it is filtered, so "absent" means "filtered" and not "never sent". - S15: cross-check each peer's raw advertised eti_version via list-peers; the list-games eti_game_version is synthesized from the catalog and can only ever equal the asserted value. - S2: poll for library convergence and verify the bidirectional exchange (bravo sees alpha's 3 games, not just alpha seeing bravo's 4). - S12/S28: require the gating unit test to appear as "<name> ... ok" so an #[ignore]d (un-run) test no longer satisfies the check. - S24/S25: assert the requested install=false final state. - S34: assert exactly 21 coherent chunks (20 files + version.ini), 21 distinct paths, no duplicates, instead of a >= 21 floor. Flake fixes: - S19: force-kill the sole source right after download-begin on a 4*CHUNK_SIZE file and accept download-failed or download-peers-gone. The old graceful shutdown on a single-chunk file could let the transfer finish first, turning the expected failure into a download-finished. A chunk may complete before the kill lands, but the full transfer cannot, so the failure is deterministic. - S26: use a large sparse source so the first operation is reliably still active when the duplicate request is issued (TOCTOU on active_operations); also assert the active operation == "Downloading". - S11: drop the "listener address must change" assertion -- it tested the OS ephemeral-port allocator and could fail spuriously; keep the same-identity / no-duplicate invariant. Coverage and determinism: - S27: add handshake::tests::inbound_hello_from_self_is_ignored for the protocol-level self guard. The CLI scenario only exercises the CLI string-compare guard, which short-circuits before any network call, so the peer-crate guard had no test. - find_fixture_game now iterates sorted(FIXTURES) so the ambiguous cnctw (fixture-bravo/multi/solid) resolves deterministically to fixture-bravo. Reviewed and deliberately left as-is (documented in the run log): S20, S21, S30, S32/S39/S44 absence checks, S42 IP-order precondition, S45. PEER_CLI_SCENARIOS.md rows S2, S11, S14, S16, S18, S19, S27 are updated to match the harness, and a dated run-log entry records the audit, the fixes, the accepted items, and the live-run evidence. Test Plan: - `just peer-cli-tests` (rebuilds the image, runs S1-S47 in Docker): baseline passed; post-fix passed; a final run on the exact committed code passed 47/47. Evidence: S14 {268435456, 268435456} balanced 2+2; S16 .eti split across B and C {134217728, 134217728}; S18 all 536870912 bytes delivered with no download-failed; S19 deterministic download-failed; S37 ~874 MiB/s. - `just test` (incl. inbound_hello_from_self_is_ignored), `just clippy` (-D warnings, all-targets), and `just fmt` all pass. Refs: PEER_CLI_SCENARIOS.md scenario matrix and 2026-06-21 run-log entry.