From d7b0127d20cd2b87898b2aa5dcbfc50e7f4d77ac Mon Sep 17 00:00:00 2001 From: ddidderr Date: Tue, 9 Jun 2026 22:06:30 +0200 Subject: [PATCH] review: Claude Fable 5 --- REVIEW_SECURITY_FABLE_5.md | 45 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 REVIEW_SECURITY_FABLE_5.md diff --git a/REVIEW_SECURITY_FABLE_5.md b/REVIEW_SECURITY_FABLE_5.md new file mode 100644 index 0000000..327f738 --- /dev/null +++ b/REVIEW_SECURITY_FABLE_5.md @@ -0,0 +1,45 @@ +I read every file, traced the STREAM construction, the parallel pipeline's ordering/cancellation logic, and the I/O paths, and cross-checked the existing `REVIEW_SECURITY_GPT5.5.md`. Verdict first: **the cryptographic core is correct.** The problems are in input validation and file handling, and two of them can destroy user data without any attacker involved. + +## What's cryptographically sound (verified, not assumed) + +The construction is the STREAM online-AE scheme (Hoang–Reyhanitabar–Rogaway–Vizár), the same shape age and Tink use: XChaCha20-Poly1305, 19-byte random prefix + 4-byte BE counter + last-block byte = 24-byte nonce, full header as AAD on every chunk. I checked the failure modes individually: + +- **Truncation** at a chunk boundary: the new final chunk was encrypted with `last=false`, decryptor computes the nonce with `last=true` → tag failure. Mid-chunk truncation and appended garbage also fail (a trailing partial read flips the real last chunk to `Normal`, wrong nonce again). +- **Reorder/duplicate/drop** of interior chunks: counter is in the nonce → fails. +- **Cross-file chunk splicing** under the same key: per-file random 19-byte prefix plus per-file header AAD → fails. +- **Header tampering** (chunk_size, KDF params, version, flags, length): header is the AAD → every chunk fails. The v1 regression test shows you understood the encode-stability trap; `version` is preserved as-read, so no downgrade is possible. +- **Empty plaintext** is authenticated via an explicit empty last chunk; a ciphertext that is only a header errors instead of silently producing an empty file. +- **Counter overflow** is checked (`bump_counter`), nonce prefix collision across files is 2⁻¹⁵² territory, `getrandom` failure propagates instead of falling back. +- **Parallel pipeline**: I traced the permit/reorder/cancel logic. The writer only writes in counter order, a failed chunk leaves `pending` non-empty → error, and `commit()` is only reachable after all three thread groups joined with no error. There is no path where a tampered chunk ends up in a committed file output. + +The README's worry that it "could theoretically be not effective at all" is unfounded for the construction itself. + +## New findings (not in the GPT-5.5 review) + +**1. High — `--chunk-size 0` with stdin silently discards all data.** +There is no encrypt-side validation of `chunk_size`. With `0`, `AheadReader` gets capacity 0, `read_until_full` on an empty slice returns `Ok(0)` immediately (BufReader fills its internal buffer, then copies 0 bytes), so the very first `read_ahead` returns `Empty`. `encrypt` then writes a single authenticated empty last chunk and **commits successfully**. For a regular file input, the `plaintext_length` cross-check saves you (`committed != 0` → error before commit). For `cat backup.tar | fcry --chunk-size 0 ...` there is no committed length, so you get a valid-looking output file that encrypts zero bytes — and since `Header::read` rejects `chunk_size == 0`, the file is additionally undecryptable. If the user deletes the source afterwards, the data is gone. Fix: validate `chunk_size` in `run()` (reject 0, set a sane max), independent of the header-side check. + +**2. High — `OutSink` can destroy the input file before reading it.** +`fcry -d -i backup.fcry.tmp -o backup.fcry` computes `tmp_path = backup.fcry.tmp` and calls `File::create` on it — which truncates your encrypted input to zero bytes before any read happens beyond what `BufReader` already buffered. The run then fails with "truncated ciphertext", but the original file is already destroyed. This is the sharpest consequence of the predictable `.tmp` scheme; the GPT review only covered the symlink and permissions angle (both of which I confirm: `File::create` follows symlinks, and the decrypted plaintext gets umask-default 0644). One fix kills all three: `O_CREAT|O_EXCL` with a random suffix (`tempfile::NamedTempFile::new_in(parent)` + `persist`), mode 0600 for decrypt output, and optionally refuse to clobber an existing `final_path` without `--force`. + +**3. Medium/Low — no key commitment.** +ChaCha20-Poly1305 is not key-committing: a ciphertext chunk can be constructed that authenticates under multiple keys (invisible-salamander / partitioning-oracle class). Practical exploitability here is low — it requires a victim decrypting attacker-supplied files and leaking success/failure, and each candidate key costs a full Argon2 run — but fixing it is nearly free and buys you something users feel daily: right now a wrong passphrase burns 1 GiB / seconds of Argon2 and then fails with the exact same `aead::Error` as a corrupted file. Derive a key-check value from the *stretched* key (e.g. first 16 bytes of BLAKE2b(key, "fcry-kcv") in the header, or encrypt a fixed zero block as chunk −1) and you get both key commitment in practice and a clean "wrong passphrase" vs. "corrupt file" distinction. Deriving it from the Argon2 output means it does not cheapen offline guessing — an attacker with the file can already test guesses against chunk 0's tag at identical cost. + +**4. Low — passphrase byte-encoding is not portable.** +The Unix path reads UTF-8 bytes from `/dev/tty`. The Windows path reads bytes via `ReadFile` on `CONIN$`, which yields the console's ANSI/OEM codepage — for any non-ASCII passphrase those byte sequences differ, and Argon2 hashes bytes. A file encrypted on Linux with `pässwört` can be undecryptable on Windows and vice versa. There's also no Unicode normalization (NFC vs. NFD — macOS input methods differ from Linux). Either read UTF-16 via `ReadConsoleW` and convert to UTF-8 + normalize to NFC on all platforms, or document ASCII-only passphrases. + +**5. Low — unchecked `u64` multiply in `decrypt_range`.** +`chunk_offset = header_len + i * cipher_chunk`: with a forged header committing `plaintext_length` near 2⁶⁴ and a small `chunk_size`, `i` can approach 2³²−1 while `cipher_chunk` is 2³²+15, and the product wraps in release (your release profile has default `overflow-checks = false`). The result is a seek to a wrong offset and a guaranteed tag failure, so it's not exploitable — but it's pre-authentication arithmetic on attacker bytes and should be `checked_mul`/`checked_add` like the rest of that function already is. + +## GPT-5.5 findings I confirm, with sharpening + +- **Pre-auth resource consumption from header fields** — real, and worse in the parallel path than stated: the pipeline reader allocates a fresh `vec![0u8; chunk_sz]` per job with up to `4 × threads` chunks in flight, so a forged `chunk_size = u32::MAX` on a 16-core box attempts ~64 × 4 GiB before any tag is checked. Argon2 `m_cost` from the header can demand up to 4 TiB; the argon2 crate's block allocation will abort the process on OOM. Cap both at parse time (e.g. chunk_size ≤ 256 MiB, m_cost ≤ some GiB ceiling with an override flag). +- **Stdout streams authenticated-but-possibly-truncated prefixes** — correct, and inherent to chunked streaming AE; every released byte is authentic, but a downstream consumer can act on a verified prefix before the truncation error lands. Document it prominently; a `--buffer-verify` mode is the only real alternative. +- **`--length 0` range decrypt succeeds with zero authentication** (doesn't even prove the key) and range success ≠ whole-file integrity — both correct as written. +- **No floors on passphrase/KDF choices** — confirmed; empty passphrase plus `--argon-memory 1 --argon-passes 1` is accepted (the argon2 crate clamps only at m_cost ≥ 8·p_cost). +- **`--raw-key` design** — confirmed; beyond the `/proc/*/cmdline` leak, the keyspace is restricted to valid UTF-8 of exactly 32 bytes, which both blocks legitimate random keys (~⅓ of random 32-byte strings aren't UTF-8, and clap's `String` parsing rejects them) and invites typing a 32-char password with zero stretching. Hex/base64 from a file or fd is the right replacement. +- **Plaintext chunk buffers unprotected** — confirmed (plain `Vec` through channels, no zeroize/mlock). I'd rank this lower than the review does: the plaintext's home is the disk anyway; the meaningful residual exposure is swap, and core dumps are already disabled on Unix. + +Two more one-liners: `commit()` doesn't fsync the parent directory after the rename, so a crash right after a successful run can lose the rename (durability, not confidentiality); and `plaintext_length` sits in cleartext in the header, which leaks nothing beyond what the unpadded ciphertext length already reveals — but if you ever care about size metadata, that's where a padding scheme (à la covert padding / PADMÉ) would slot in. + +If you want, I can write the patch set for the top items — chunk-size validation, the `OutSink` rework with `create_new` + random suffix + 0600, and header-side caps are all small, self-contained diffs.