Files
fcry/REVIEW_SECURITY_FABLE_5.md
T
2026-06-09 22:06:30 +02:00

8.7 KiB
Raw Blame History

I read every file, traced the STREAM construction, the parallel pipeline's ordering/cancellation logic, and the I/O paths, and cross-checked the existing REVIEW_SECURITY_GPT5.5.md. Verdict first: the cryptographic core is correct. The problems are in input validation and file handling, and two of them can destroy user data without any attacker involved.

What's cryptographically sound (verified, not assumed)

The construction is the STREAM online-AE scheme (HoangReyhanitabarRogawayVizár), the same shape age and Tink use: XChaCha20-Poly1305, 19-byte random prefix + 4-byte BE counter + last-block byte = 24-byte nonce, full header as AAD on every chunk. I checked the failure modes individually:

  • Truncation at a chunk boundary: the new final chunk was encrypted with last=false, decryptor computes the nonce with last=true → tag failure. Mid-chunk truncation and appended garbage also fail (a trailing partial read flips the real last chunk to Normal, wrong nonce again).
  • Reorder/duplicate/drop of interior chunks: counter is in the nonce → fails.
  • Cross-file chunk splicing under the same key: per-file random 19-byte prefix plus per-file header AAD → fails.
  • Header tampering (chunk_size, KDF params, version, flags, length): header is the AAD → every chunk fails. The v1 regression test shows you understood the encode-stability trap; version is preserved as-read, so no downgrade is possible.
  • Empty plaintext is authenticated via an explicit empty last chunk; a ciphertext that is only a header errors instead of silently producing an empty file.
  • Counter overflow is checked (bump_counter), nonce prefix collision across files is 2⁻¹⁵² territory, getrandom failure propagates instead of falling back.
  • Parallel pipeline: I traced the permit/reorder/cancel logic. The writer only writes in counter order, a failed chunk leaves pending non-empty → error, and commit() is only reachable after all three thread groups joined with no error. There is no path where a tampered chunk ends up in a committed file output.

The README's worry that it "could theoretically be not effective at all" is unfounded for the construction itself.

New findings (not in the GPT-5.5 review)

1. High — --chunk-size 0 with stdin silently discards all data. There is no encrypt-side validation of chunk_size. With 0, AheadReader gets capacity 0, read_until_full on an empty slice returns Ok(0) immediately (BufReader fills its internal buffer, then copies 0 bytes), so the very first read_ahead returns Empty. encrypt then writes a single authenticated empty last chunk and commits successfully. For a regular file input, the plaintext_length cross-check saves you (committed != 0 → error before commit). For cat backup.tar | fcry --chunk-size 0 ... there is no committed length, so you get a valid-looking output file that encrypts zero bytes — and since Header::read rejects chunk_size == 0, the file is additionally undecryptable. If the user deletes the source afterwards, the data is gone. Fix: validate chunk_size in run() (reject 0, set a sane max), independent of the header-side check.

2. High — OutSink can destroy the input file before reading it. fcry -d -i backup.fcry.tmp -o backup.fcry computes tmp_path = backup.fcry.tmp and calls File::create on it — which truncates your encrypted input to zero bytes before any read happens beyond what BufReader already buffered. The run then fails with "truncated ciphertext", but the original file is already destroyed. This is the sharpest consequence of the predictable <name>.tmp scheme; the GPT review only covered the symlink and permissions angle (both of which I confirm: File::create follows symlinks, and the decrypted plaintext gets umask-default 0644). One fix kills all three: O_CREAT|O_EXCL with a random suffix (tempfile::NamedTempFile::new_in(parent) + persist), mode 0600 for decrypt output, and optionally refuse to clobber an existing final_path without --force.

3. Medium/Low — no key commitment. ChaCha20-Poly1305 is not key-committing: a ciphertext chunk can be constructed that authenticates under multiple keys (invisible-salamander / partitioning-oracle class). Practical exploitability here is low — it requires a victim decrypting attacker-supplied files and leaking success/failure, and each candidate key costs a full Argon2 run — but fixing it is nearly free and buys you something users feel daily: right now a wrong passphrase burns 1 GiB / seconds of Argon2 and then fails with the exact same aead::Error as a corrupted file. Derive a key-check value from the stretched key (e.g. first 16 bytes of BLAKE2b(key, "fcry-kcv") in the header, or encrypt a fixed zero block as chunk 1) and you get both key commitment in practice and a clean "wrong passphrase" vs. "corrupt file" distinction. Deriving it from the Argon2 output means it does not cheapen offline guessing — an attacker with the file can already test guesses against chunk 0's tag at identical cost.

4. Low — passphrase byte-encoding is not portable. The Unix path reads UTF-8 bytes from /dev/tty. The Windows path reads bytes via ReadFile on CONIN$, which yields the console's ANSI/OEM codepage — for any non-ASCII passphrase those byte sequences differ, and Argon2 hashes bytes. A file encrypted on Linux with pässwört can be undecryptable on Windows and vice versa. There's also no Unicode normalization (NFC vs. NFD — macOS input methods differ from Linux). Either read UTF-16 via ReadConsoleW and convert to UTF-8 + normalize to NFC on all platforms, or document ASCII-only passphrases.

5. Low — unchecked u64 multiply in decrypt_range. chunk_offset = header_len + i * cipher_chunk: with a forged header committing plaintext_length near 2⁶⁴ and a small chunk_size, i can approach 2³²−1 while cipher_chunk is 2³²+15, and the product wraps in release (your release profile has default overflow-checks = false). The result is a seek to a wrong offset and a guaranteed tag failure, so it's not exploitable — but it's pre-authentication arithmetic on attacker bytes and should be checked_mul/checked_add like the rest of that function already is.

GPT-5.5 findings I confirm, with sharpening

  • Pre-auth resource consumption from header fields — real, and worse in the parallel path than stated: the pipeline reader allocates a fresh vec![0u8; chunk_sz] per job with up to 4 × threads chunks in flight, so a forged chunk_size = u32::MAX on a 16-core box attempts ~64 × 4 GiB before any tag is checked. Argon2 m_cost from the header can demand up to 4 TiB; the argon2 crate's block allocation will abort the process on OOM. Cap both at parse time (e.g. chunk_size ≤ 256 MiB, m_cost ≤ some GiB ceiling with an override flag).
  • Stdout streams authenticated-but-possibly-truncated prefixes — correct, and inherent to chunked streaming AE; every released byte is authentic, but a downstream consumer can act on a verified prefix before the truncation error lands. Document it prominently; a --buffer-verify mode is the only real alternative.
  • --length 0 range decrypt succeeds with zero authentication (doesn't even prove the key) and range success ≠ whole-file integrity — both correct as written.
  • No floors on passphrase/KDF choices — confirmed; empty passphrase plus --argon-memory 1 --argon-passes 1 is accepted (the argon2 crate clamps only at m_cost ≥ 8·p_cost).
  • --raw-key design — confirmed; beyond the /proc/*/cmdline leak, the keyspace is restricted to valid UTF-8 of exactly 32 bytes, which both blocks legitimate random keys (~⅓ of random 32-byte strings aren't UTF-8, and clap's String parsing rejects them) and invites typing a 32-char password with zero stretching. Hex/base64 from a file or fd is the right replacement.
  • Plaintext chunk buffers unprotected — confirmed (plain Vec<u8> through channels, no zeroize/mlock). I'd rank this lower than the review does: the plaintext's home is the disk anyway; the meaningful residual exposure is swap, and core dumps are already disabled on Unix.

Two more one-liners: commit() doesn't fsync the parent directory after the rename, so a crash right after a successful run can lose the rename (durability, not confidentiality); and plaintext_length sits in cleartext in the header, which leaks nothing beyond what the unpadded ciphertext length already reveals — but if you ever care about size metadata, that's where a padding scheme (à la covert padding / PADMÉ) would slot in.

If you want, I can write the patch set for the top items — chunk-size validation, the OutSink rework with create_new + random suffix + 0600, and header-side caps are all small, self-contained diffs.