cleanup: remove old review files

2026-06-10 20:51:29 +02:00
parent 227f78a767
commit 126a86ec07
2 changed files with 0 additions and 79 deletions
@@ -1,45 +0,0 @@
 I read every file, traced the STREAM construction, the parallel pipeline's ordering/cancellation logic, and the I/O paths, and cross-checked the existing `REVIEW_SECURITY_GPT5.5.md`. Verdict first: **the cryptographic core is correct.** The problems are in input validation and file handling, and two of them can destroy user data without any attacker involved.
 ## What's cryptographically sound (verified, not assumed)
 The construction is the STREAM online-AE scheme (Hoang–Reyhanitabar–Rogaway–Vizár), the same shape age and Tink use: XChaCha20-Poly1305, 19-byte random prefix + 4-byte BE counter + last-block byte = 24-byte nonce, full header as AAD on every chunk. I checked the failure modes individually:
 - **Truncation** at a chunk boundary: the new final chunk was encrypted with `last=false`, decryptor computes the nonce with `last=true` → tag failure. Mid-chunk truncation and appended garbage also fail (a trailing partial read flips the real last chunk to `Normal`, wrong nonce again).
 - **Reorder/duplicate/drop** of interior chunks: counter is in the nonce → fails.
 - **Cross-file chunk splicing** under the same key: per-file random 19-byte prefix plus per-file header AAD → fails.
 - **Header tampering** (chunk_size, KDF params, version, flags, length): header is the AAD → every chunk fails. The v1 regression test shows you understood the encode-stability trap; `version` is preserved as-read, so no downgrade is possible.
 - **Empty plaintext** is authenticated via an explicit empty last chunk; a ciphertext that is only a header errors instead of silently producing an empty file.
 - **Counter overflow** is checked (`bump_counter`), nonce prefix collision across files is 2⁻¹⁵² territory, `getrandom` failure propagates instead of falling back.
 - **Parallel pipeline**: I traced the permit/reorder/cancel logic. The writer only writes in counter order, a failed chunk leaves `pending` non-empty → error, and `commit()` is only reachable after all three thread groups joined with no error. There is no path where a tampered chunk ends up in a committed file output.
 The README's worry that it "could theoretically be not effective at all" is unfounded for the construction itself.
 ## New findings (not in the GPT-5.5 review)
 **1. High — `--chunk-size 0` with stdin silently discards all data.**
 There is no encrypt-side validation of `chunk_size`. With `0`, `AheadReader` gets capacity 0, `read_until_full` on an empty slice returns `Ok(0)` immediately (BufReader fills its internal buffer, then copies 0 bytes), so the very first `read_ahead` returns `Empty`. `encrypt` then writes a single authenticated empty last chunk and **commits successfully**. For a regular file input, the `plaintext_length` cross-check saves you (`committed != 0` → error before commit). For `cat backup.tar | fcry --chunk-size 0 ...` there is no committed length, so you get a valid-looking output file that encrypts zero bytes — and since `Header::read` rejects `chunk_size == 0`, the file is additionally undecryptable. If the user deletes the source afterwards, the data is gone. Fix: validate `chunk_size` in `run()` (reject 0, set a sane max), independent of the header-side check.
 **2. High — `OutSink` can destroy the input file before reading it.**
 `fcry -d -i backup.fcry.tmp -o backup.fcry` computes `tmp_path = backup.fcry.tmp` and calls `File::create` on it — which truncates your encrypted input to zero bytes before any read happens beyond what `BufReader` already buffered. The run then fails with "truncated ciphertext", but the original file is already destroyed. This is the sharpest consequence of the predictable `<name>.tmp` scheme; the GPT review only covered the symlink and permissions angle (both of which I confirm: `File::create` follows symlinks, and the decrypted plaintext gets umask-default 0644). One fix kills all three: `O_CREAT|O_EXCL` with a random suffix (`tempfile::NamedTempFile::new_in(parent)` + `persist`), mode 0600 for decrypt output, and optionally refuse to clobber an existing `final_path` without `--force`.
 **3. Medium/Low — no key commitment.**
 ChaCha20-Poly1305 is not key-committing: a ciphertext chunk can be constructed that authenticates under multiple keys (invisible-salamander / partitioning-oracle class). Practical exploitability here is low — it requires a victim decrypting attacker-supplied files and leaking success/failure, and each candidate key costs a full Argon2 run — but fixing it is nearly free and buys you something users feel daily: right now a wrong passphrase burns 1 GiB / seconds of Argon2 and then fails with the exact same `aead::Error` as a corrupted file. Derive a key-check value from the *stretched* key (e.g. first 16 bytes of BLAKE2b(key, "fcry-kcv") in the header, or encrypt a fixed zero block as chunk −1) and you get both key commitment in practice and a clean "wrong passphrase" vs. "corrupt file" distinction. Deriving it from the Argon2 output means it does not cheapen offline guessing — an attacker with the file can already test guesses against chunk 0's tag at identical cost.
 **4. Low — passphrase byte-encoding is not portable.**
 The Unix path reads UTF-8 bytes from `/dev/tty`. The Windows path reads bytes via `ReadFile` on `CONIN$`, which yields the console's ANSI/OEM codepage — for any non-ASCII passphrase those byte sequences differ, and Argon2 hashes bytes. A file encrypted on Linux with `pässwört` can be undecryptable on Windows and vice versa. There's also no Unicode normalization (NFC vs. NFD — macOS input methods differ from Linux). Either read UTF-16 via `ReadConsoleW` and convert to UTF-8 + normalize to NFC on all platforms, or document ASCII-only passphrases.
 **5. Low — unchecked `u64` multiply in `decrypt_range`.**
 `chunk_offset = header_len + i * cipher_chunk`: with a forged header committing `plaintext_length` near 2⁶⁴ and a small `chunk_size`, `i` can approach 2³²−1 while `cipher_chunk` is 2³²+15, and the product wraps in release (your release profile has default `overflow-checks = false`). The result is a seek to a wrong offset and a guaranteed tag failure, so it's not exploitable — but it's pre-authentication arithmetic on attacker bytes and should be `checked_mul`/`checked_add` like the rest of that function already is.
 ## GPT-5.5 findings I confirm, with sharpening
 - **Pre-auth resource consumption from header fields** — real, and worse in the parallel path than stated: the pipeline reader allocates a fresh `vec![0u8; chunk_sz]` per job with up to `4 × threads` chunks in flight, so a forged `chunk_size = u32::MAX` on a 16-core box attempts ~64 × 4 GiB before any tag is checked. Argon2 `m_cost` from the header can demand up to 4 TiB; the argon2 crate's block allocation will abort the process on OOM. Cap both at parse time (e.g. chunk_size ≤ 256 MiB, m_cost ≤ some GiB ceiling with an override flag).
 - **Stdout streams authenticated-but-possibly-truncated prefixes** — correct, and inherent to chunked streaming AE; every released byte is authentic, but a downstream consumer can act on a verified prefix before the truncation error lands. Document it prominently; a `--buffer-verify` mode is the only real alternative.
 - **`--length 0` range decrypt succeeds with zero authentication** (doesn't even prove the key) and range success ≠ whole-file integrity — both correct as written.
 - **No floors on passphrase/KDF choices** — confirmed; empty passphrase plus `--argon-memory 1 --argon-passes 1` is accepted (the argon2 crate clamps only at m_cost ≥ 8·p_cost).
 - **`--raw-key` design** — confirmed; beyond the `/proc/*/cmdline` leak, the keyspace is restricted to valid UTF-8 of exactly 32 bytes, which both blocks legitimate random keys (~⅓ of random 32-byte strings aren't UTF-8, and clap's `String` parsing rejects them) and invites typing a 32-char password with zero stretching. Hex/base64 from a file or fd is the right replacement.
 - **Plaintext chunk buffers unprotected** — confirmed (plain `Vec<u8>` through channels, no zeroize/mlock). I'd rank this lower than the review does: the plaintext's home is the disk anyway; the meaningful residual exposure is swap, and core dumps are already disabled on Unix.
 Two more one-liners: `commit()` doesn't fsync the parent directory after the rename, so a crash right after a successful run can lose the rename (durability, not confidentiality); and `plaintext_length` sits in cleartext in the header, which leaks nothing beyond what the unpadded ciphertext length already reveals — but if you ever care about size metadata, that's where a padding scheme (à la covert padding / PADMÉ) would slot in.
 If you want, I can write the patch set for the top items — chunk-size validation, the `OutSink` rework with `create_new` + random suffix + 0600, and header-side caps are all small, self-contained diffs.
@@ -1,34 +0,0 @@
 # Findings
 - High: attacker-controlled headers can cause large pre-auth resource use. chunk_size and Argon2 params are read before any AEAD tag is verified, then used for allocation/KDF work. A malicious file can request huge buffers or Argon2 memory/time. See
  src/header.rs:107, src/crypto.rs:54, src/crypto.rs:189. Add strict max caps before allocation/KDF.
 - High: output temp file handling is unsafe in hostile directories. Output uses predictable <name>.tmp plus File::create, which follows symlinks and truncates existing files. Decrypted plaintext may also be created with permissive umask-derived
  permissions. See src/utils.rs:71. Use randomized create_new temp files, 0600 permissions for plaintext, and safe persist semantics.
 - Medium: stdout decrypt streams verified chunks before whole-file success. File output is protected by temp-file commit, but stdout cannot roll back. A truncated valid-prefix ciphertext can emit authentic but incomplete plaintext before the final
  error. See src/crypto.rs:217 and src/pipeline.rs:348. Document this sharply or add a “verify before stdout” mode.
 - Medium: random-access decrypt authenticates only requested chunks. That is expected for range reads, but users may mistake success for whole-file integrity. --length 0 succeeds without authenticating any chunk, so it does not prove the key or file is
  valid. See src/crypto.rs:307.
 - Medium: weak passphrase choices are accepted. Empty/very short passphrases and very low Argon2 settings are possible. Defaults are strong, but user-supplied weak parameters are stored and honored. See src/main.rs:55, src/secrets.rs:172. Enforce
  floors or require an explicit insecure flag.
 - Medium: raw-key UX invites misuse. --raw-key is a UTF-8 command-line string, visible in process listings, and not truly arbitrary 32 raw bytes. It is documented as dangerous, but still a first-class CLI path. See src/main.rs:37. Prefer hex/base64
  from file/stdin/fd, or make raw key testing-only.
 - Medium: plaintext buffers are not zeroized. Keys/passphrases get protected handling, but plaintext/ciphertext chunk buffers are ordinary Vec<u8> and may leave sensitive plaintext in heap memory after decrypt/encrypt. See src/crypto.rs:127, src/
  crypto.rs:210, src/pipeline.rs:48.
 - Medium/Low: unchecked arithmetic and unbounded CLI knobs remain. chunk_size + TAG_LEN, random-access offsets, huge -j values, and release overflow behavior deserve explicit checked math and caps. See src/crypto.rs:190, src/pipeline.rs:61,
  Cargo.toml:32.
 - Low: platform hardening is partial. Unix core dumps are disabled, which is good, but Windows crash dumps are not addressed, and non-secret plaintext buffers are not protected from swap/dumps. See src/main.rs:141.
 - Low: custom crypto framing needs a written threat model and test vectors. The construction is reasonable, but custom file formats are easy to misuse over time. README still says the tool is early and not thoroughly tested, and links a missing TODO.
  See README.md:6.
 Good Signs
 The core AEAD choice is solid: XChaCha20-Poly1305 with random nonce prefix, per-chunk counters, full header as AAD, and an authenticated final chunk. The code also commits plaintext length for regular files, rejects unknown header flags, uses Argon2id
 for passphrases, disables Unix core dumps, and avoids committing partial file outputs on failure.
 Post-Quantum
 For the current symmetric-only design, there is no RSA/ECC public-key crypto to replace. A 256-bit symmetric key is generally the right shape for post-quantum resistance; Grover-style search still leaves roughly 128-bit security. The weak link is
 password entropy, not the AEAD.
 If you add recipient/public-key encryption or signatures, use standardized PQC: NIST finalized FIPS 203 ML-KEM, FIPS 204 ML-DSA, and FIPS 205 SLH-DSA in 2024, and recommends migration planning now. Sources: NIST FIPS announcement
 (<https://www.nist.gov/news-events/news/2024/08/announcing-approval-three-federal-information-processing-standards-fips>), NIST PQC project (<https://www.nist.gov/programs-projects/post-quantum-cryptography>).