feat!: add file-format header, configurable chunks, integration tests

Introduce a self-describing on-disk format and use it to address several shortcomings of the 0.9 file layout, where the file simply began with a raw 19-byte STREAM nonce prefix and used a hardcoded 64 KiB chunk size. What changed for users ---------------------- * fcry files now start with a 16-byte header: magic ("fcry"), version, algorithm id, flags, reserved byte, plaintext chunk_size (u32 LE), KDF id + params, then the 19-byte nonce prefix. The full encoded header is bound as AAD to every chunk, so tampering with chunk_size, algorithm id, nonce prefix, or any future KDF parameter causes authentication failure on every chunk -- not just the first. * New `--chunk-size` CLI flag (encryption only). The decryptor reads the chunk size from the header, so files encrypted with a non-default size decrypt without the user having to remember it. * Default plaintext chunk size raised from 64 KiB to 1 MiB. * Bad input is now reported as an error instead of panicking: empty ciphertext, truncated final chunk, wrong magic, bad version, zero chunk_size, unknown algorithm id, and short --raw-key all return a non-zero exit status with a diagnostic on stderr. * Empty plaintext now produces a valid (authenticated) empty ciphertext instead of panicking; the decryptor verifies it. * `main` exits with status 1 on error (previously it printed and returned 0). This is a breaking change to the file format: 0.9.x files have no magic or header and cannot be read by 0.10.x. Version bumped to 0.10.0. Why this approach ----------------- The header-as-AAD pattern is the standard way to make file-format metadata tamper-evident without a separate signature: any bit-flip in the header propagates into every chunk's authentication tag check, so an attacker cannot, for example, change chunk_size to mis-frame the stream or downgrade the algorithm id. Storing chunk_size in the header (rather than fixing it at compile time) lets us experiment with chunk sizes without breaking decrypt compatibility, and is preparation for the parallel-pipeline work in Roadmap 1.0 where worker count and chunk size interact. The KDF section is a tagged variant (currently only `Raw`) so that adding Argon2id later only adds a new variant + its salt/cost fields; existing files keep decrypting because they carry `kdf_id = 0`. Other changes bundled in ------------------------ * Switch RNG from `rand` (0.10) to `getrandom` (0.3). We only need OS-provided random bytes for the nonce prefix; pulling in the full `rand` crate for one `OsRng.fill_bytes` call was overkill, and `rand` 0.10's `OsRng` API churn makes `getrandom` the cleaner fit. * `FcryError` gains a `Format(String)` variant for header / framing errors and a `From<getrandom::Error>` impl (replacing the `rand::Error` impl). * Drop the noisy `[reader]` / `[encrypt]` / `[decrypt]` stderr tracing prints and the `dbg!(&cli.raw_key)` (which leaked the key to stderr). * Replace `unwrap()` on file open / create with `?` so I/O errors surface as structured `FcryError::Io` instead of aborting. * Remove the unused `AheadReader::read_exact` wrapper -- the decryptor now reads the header through the underlying `BufRead` directly before wrapping it in `AheadReader`. Tests ----- Add `tests/roundtrip.rs` (assert_cmd + tempfile) covering: empty input, single byte, sub-chunk, exact chunk, chunk+1, multi-chunk, custom small chunk size (4096), pathological 1-byte chunk size, stdin/stdout pipe mode, wrong key rejection, tampered header, tampered ciphertext, truncated ciphertext, bad magic, short raw key, and the header-is-authoritative property (encrypt with a weird chunk size, decrypt without specifying one). Also adds a unit test in `header.rs` for header encode/decode roundtrip and bad-magic rejection. TODO.md trimmed to the concrete follow-up sequence (manual STREAM nonces, secrets/rlimit, atomic output, argon2id KDF + prompt, multi-threaded pipeline, length-committed mode). Test plan --------- * `cargo clippy && cargo clippy --tests` -- clean. * `cargo +nightly fmt` -- no diff. * `cargo test` -- 16 integration + 2 header unit tests pass. * Manual: `echo hi | fcry --raw-key 0123456789abcdef0123456789abcdef | fcry -d --raw-key 0123456789abcdef0123456789abcdef` prints `hi`. Trailers -------- Refs: TODO.md (Roadmap 1.0 follow-up sequence) Breaking-Change: file format; 0.9.x files cannot be decrypted by 0.10.x
2026-05-02 17:22:47 +02:00
parent 5e51b4bfe1
commit 4eee8e7a95
10 changed files with 761 additions and 392 deletions
@@ -1,28 +1,7 @@
-# Roadmap 1.0
-## Summary
-Make the program real-world usable and stable.
-
-## Knowledge and Design
-* understand `encrypt_next_in_place()`'s first argument better
-  * current understanding:
-    * associated data is used for parts of the data that cannot be
-      encrypted but should also be integrity protected by the authentication tag
-    * since there are no parts that cannot be encrypted in the context of `fcry` it is correct
-      to pass an empty slice to the first argument of `encrypt_next_in_place()`
-* currently `fcry` uses 64 KiB blocks as single AEAD messages
-  * as stated [here](https://pycryptodome.readthedocs.io/en/latest/src/cipher/chacha20_poly1305.html) (limit of 13 billion messages) would imply a maximum file-size of `64 KiB * 13e9 = 832e9 KiB = 774.86 TiB`. While a file this size could be considered a special (and unsupported) use case anyway, performance is also a consideration. Does performance improve noticably with larger message sizes?
-* unit tests
-
-## Features
-* password hashing
-  * configurable algorithm (sane default)
-  * configurable nr of rounds (sand default)
-  * a way to enter the password securely in a prompt while still being able to handle `stdin` data
-* add usage examples to README.md
-
-# Roadmap 2.0
-* parallel processing: use all available (or configurable) CPU cores
-
-# Roadmap later or never
-* split into `lib` and `bin`
-* other AEAD algorithms
+**Deferred to follow-up commits** (in order):
+1. Switch single `EncryptorBE32` for manual STREAM nonces (preparation for parallelism)
+2. `secrets` crate for key handling + `rlimit` to disable core dumps
+3. Atomic file output (`.tmp` + rename)
+4. `argon2id` KDF + passphrase prompt + CLI flags
+5. Multi-threaded pipeline (worker pool + ordered writer)
+6. Length-committed mode + random-access decrypt fast path for files