ddidderr/fcry

Author SHA1 Message Date

Author	SHA1	Message	Date
ddidderr	81ac1475ad	feat: harden fcry format and IO policy Introduce a central policy module for format and resource validation, then route header parsing, KDF acceptance, range arithmetic, and pipeline sizing through that policy. New encryptions now write v3 headers that include an authenticated key commitment, which lets decrypt reject wrong keys or passphrases before chunk processing while preserving valid v1/v2 decrypt compatibility inside the configured caps. Replace process-list-visible raw key input with --key-file, add passphrase NFC normalization, enforce stronger new-encryption passphrase/KDF floors unless --allow-weak-kdf is supplied, and add a configurable decrypt Argon2 memory ceiling. Chunk buffers in the serial, parallel, and lookahead paths now use zeroizing storage. Rework output handling around randomized create-new temporary files with Unix 0600 mode, file fsync before persist, best-effort parent directory fsync, default no-overwrite behavior, safe in-place replacement, --force, --temp-dir, and --buffer-verify for decrypt-to-stdout. Known caveat: --key-file currently reads with a single read call. That is fine for regular files but can reject short reads from pipes or process substitution. A follow-up fix will make key-file reads loop before EOF. Test Plan: - cargo fmt --check - cargo clippy --all-targets -- -D warnings - cargo test - git diff --check - cargo run -- --help Refs: fcry security hardening plan	2026-06-09 23:45:02 +02:00
ddidderr	91b459657e	fix(pipeline): bound reorder buffer and fail fast on worker error The multi-threaded pipeline introduced in `75afadb` had two related defects flagged by external review: 1. The writer's reorder buffer was unbounded. `ordered_writer` accepted a `_cap` parameter that was documented as the in-flight bound but was never read. The writer drained `done_rx` eagerly into a `BTreeMap`, so neither the bounded job channel nor the bounded done channel ever exerted backpressure on the reader. A slow or stuck worker would let the writer accumulate every subsequent chunk in `pending` until the system ran out of memory. With adversarial input this is a memory-exhaustion vector; with merely uneven workers it silently violated the documented memory ceiling. 2. The pipeline did not fail fast. When a worker hit an AEAD authentication failure it returned `Err`, dropped its channel clones, and exited — but the other workers, the reader, and the writer kept running until natural EOF. On a tampered N-byte file we burned full I/O plus (T-1)/T of the AEAD CPU before surfacing the error. Combined with (1) this also stretched the window in which `pending` could grow. Both issues are addressed by a single rewrite of `pipeline.rs`: - A bounded "permit" channel pre-filled with `in_flight_capacity` `()` tokens. The reader acquires one before sending each job; the writer releases one after flushing the corresponding chunk in order. Total in-flight chunks (queued jobs + in-progress at workers + pending in the reorder map) is now hard-capped at `4 * threads`, with the writer in lockstep with the actual disk write rather than ahead of it. - An `Arc<AtomicBool> cancel` flag that workers set on AEAD failure. Workers check it at the top of their loop and drain remaining queued jobs without doing AEAD work. The reader checks it before each new chunk, so a tampered chunk causes the reader to stop within the in-flight window rather than after EOF. The reader uses `permit_rx.recv_timeout(50ms)` rather than a blocking `recv` so it can poll the cancel flag even when the rest of the pipeline has quiesced. Without this, a 3-way deadlock is possible: worker errors after all permits are out, the writer is blocked on a missing-counter chunk that will never arrive, the other workers are idle on `jobs_rx`, and the reader is blocked on `permit_rx`. The 50 ms wakeup is well below typical user-perceptible latency and only runs when the pipeline is otherwise idle, so its cost is negligible. While rewriting I also collapsed `encrypt_parallel`/`decrypt_parallel` onto a shared `run_pipeline` helper parameterised by an `is_encrypt` bool — the two functions previously duplicated ~150 lines of channel plumbing for a one-line difference (`encrypt_in_place` vs `decrypt_in_place`). Same for the reorder writer: a single `ordered_writer` now returns `(OutSink, u64)`, and encrypt simply ignores the byte count (decrypt uses it for the length cross-check). Removed the stale "wrapping_add" on the in-order counter — wrapping here would mask a real bug since `bump_counter` already rejects overflow upstream — and corrected the per-thread memory estimate in the module-level doc to match the new bounded model. The job-channel capacity (`channel_capacity = 2 * threads`) is left unchanged. The new permit cap (`4 * threads`) is deliberately larger so out-of-order completion has slack; if the gap is ever exhausted the only consequence is reader backpressure, never unbounded growth. Test plan: - `cargo test` still passes the full 28-test integration suite, including `parallel_and_serial_outputs_round_trip` (proves the refactored unified pipeline produces bit-identical output to the serial path) and `rejects_tampered_ciphertext` (still surfaces the AEAD error, now via the cancel path). - Manual fail-fast probe: 200 MiB random plaintext, encrypt with `-j 8`, flip a byte at offset 2000 (inside chunk 0), decrypt with `-j 8`. Errors in ~2 ms, vs ~28 ms for a clean decrypt of the same file — confirming the reader stopped within the in-flight window rather than draining the whole input. - The `ordered_writer` cancel deadlock case is hit organically by the same probe: chunk 0 fails authentication, no further counter-0 chunk ever arrives, but the reader exits via the 50 ms cancel poll and the rest of the pipeline drains. Refs: external review (P2 / Gemini #1, Gemini #2, GLM51 #2/#7/#8).	2026-05-02 21:29:08 +02:00
ddidderr	75afadb1ec	feat!: multi-threaded pipeline + length-committed/random-access decrypt Completes the two follow-ups deferred from the v0.10 format/secrets work: multi-threaded AEAD encrypt/decrypt and a length-committed file format that enables random-access decryption. # Format change (file format v2) Bumps the on-disk header version to 2 and introduces a flag bit (`FLAG_LENGTH_COMMITTED`, bit 0). When set, an authenticated `u64 LE` plaintext length is appended to the header after the nonce prefix. v1 files still decrypt unchanged. v2 readers reject unknown flag bits. The flag is set automatically when the input is a regular file (we stat the open FD to avoid TOCTOU). Stdin/pipes/FIFOs encrypt as before with the flag clear. Sequential decrypt cross-checks the produced byte count against the committed length as defense in depth (the AEAD already authenticates the value via header AAD, but failing before we rename the temp file into place is preferable to failing after). # Random-access decrypt `fcry -d -i FILE --offset N --length L` seeks directly to the chunk(s) covering `[N, N+L)` and decrypts only those, without scanning the predecessors. Requires a seekable file whose header has the length-committed flag — stdin/pipe-encrypted files cannot use this path and the CLI rejects it with a clear error. The chunk layout is fully determined by `chunk_size` and the committed total length (last chunk's plaintext is `total - (n_chunks-1)chunk_size`; its ciphertext length is `last_pt + 16`). Each chunk's nonce is `make_nonce(prefix, chunk_index, is_last_chunk)` which matches what sequential encrypt produced, so plaintext slices come out bit-identical to a full sequential decrypt. # Multi-threaded pipeline New `src/pipeline.rs` implements: reader thread → bounded jobs channel → N AEAD workers → bounded results channel → writer thread The reader stays serial (it owns the input handle and uses lookahead to detect the last chunk). Workers parallelize the AEAD step (each chunk is independent under STREAM). The writer holds a `BTreeMap<u32, Vec<u8>>` reorder buffer and only flushes in counter order. Commit is deferred to the main thread, so a failure anywhere — reader I/O, AEAD auth, writer I/O — drops `OutSink` without renaming the temp file into place. The `atomic_output_no_stale_tmp_on_failure` integration test still passes. Channel and reorder capacities scale with worker count (`2threads`); peak memory is roughly `chunk_size * 4 * threads`. With 1 MiB chunks and 8 cores that's ~32 MiB, which we accept. Default thread count is `std::thread::available_parallelism()`; override with `-j/--threads N`. `-j 1` keeps the original serial path. Stdin/stdout streaming works under the parallel path because `Stdin` (unlocked) is `Send` — only `StdinLock` isn't, so the boxed reader wraps `Stdin` directly in a `BufReader`. Adds `crossbeam-channel = "0.5"` for bounded MPMC. The cipher (`XChaCha20Poly1305`) and the header AAD are shared across workers via `Arc`; the AEAD's internal key copy is zeroized on drop as before. # CLI surface -j, --threads <N> worker thread count (default: cores) --offset <BYTES> random-access decrypt: slice start --length <BYTES> random-access decrypt: slice length `--offset`/`--length` require `--decrypt` and `--input-file` (clap enforces; we also surface a clean runtime error if only one is supplied). # Test plan * `cargo test` — 5 unit + 27 integration, all green. * New integration coverage: - parallel roundtrip on multi-chunk inputs (`-j 4`) - parallel-encrypted ciphertext decrypted serially, and vice-versa (output bit-identical regardless of worker count) - parallel pipe stdin↔stdout (asserts flag byte is 0 for stdin inputs — no length committed without a known size) - file inputs auto-commit length (asserts version=2 and flags bit 0 set in the raw header bytes) - random-access slices spanning chunk-aligned, mid-chunk, last-chunk, and full-file ranges - random-access rejects out-of-range and stdin-encrypted inputs, accepts zero-length - tampering the committed length byte fails AEAD authentication - hand-crafted v1 header still decodes (no flag bit set) * `cargo clippy --all-targets -- -D warnings` clean. * `cargo +nightly fmt` clean. Removes `TODO.md` since both deferred items are now implemented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:33:00 +02:00

81ac1475ad

feat: harden fcry format and IO policy

Introduce a central policy module for format and resource validation, then
route header parsing, KDF acceptance, range arithmetic, and pipeline sizing
through that policy. New encryptions now write v3 headers that include an
authenticated key commitment, which lets decrypt reject wrong keys or
passphrases before chunk processing while preserving valid v1/v2 decrypt
compatibility inside the configured caps.

Replace process-list-visible raw key input with --key-file, add passphrase NFC
normalization, enforce stronger new-encryption passphrase/KDF floors unless
--allow-weak-kdf is supplied, and add a configurable decrypt Argon2 memory
ceiling. Chunk buffers in the serial, parallel, and lookahead paths now use
zeroizing storage.

Rework output handling around randomized create-new temporary files with Unix
0600 mode, file fsync before persist, best-effort parent directory fsync,
default no-overwrite behavior, safe in-place replacement, --force, --temp-dir,
and --buffer-verify for decrypt-to-stdout.

Known caveat: --key-file currently reads with a single read call. That is fine
for regular files but can reject short reads from pipes or process
substitution. A follow-up fix will make key-file reads loop before EOF.

Test Plan:
- cargo fmt --check
- cargo clippy --all-targets -- -D warnings
- cargo test
- git diff --check
- cargo run -- --help

Refs: fcry security hardening plan

2026-06-09 23:45:02 +02:00

ddidderr

91b459657e

fix(pipeline): bound reorder buffer and fail fast on worker error

The multi-threaded pipeline introduced in 75afadb had two related defects
flagged by external review:

1. The writer's reorder buffer was unbounded. `ordered_writer` accepted
   a `_cap` parameter that was documented as the in-flight bound but
   was never read. The writer drained `done_rx` eagerly into a
   `BTreeMap`, so neither the bounded job channel nor the bounded done
   channel ever exerted backpressure on the reader. A slow or stuck
   worker would let the writer accumulate every subsequent chunk in
   `pending` until the system ran out of memory. With adversarial input
   this is a memory-exhaustion vector; with merely uneven workers it
   silently violated the documented memory ceiling.

2. The pipeline did not fail fast. When a worker hit an AEAD
   authentication failure it returned `Err`, dropped its channel
   clones, and exited — but the other workers, the reader, and the
   writer kept running until natural EOF. On a tampered N-byte file
   we burned full I/O plus (T-1)/T of the AEAD CPU before surfacing
   the error. Combined with (1) this also stretched the window in
   which `pending` could grow.

Both issues are addressed by a single rewrite of `pipeline.rs`:

  - A bounded "permit" channel pre-filled with `in_flight_capacity`
    `()` tokens. The reader acquires one before sending each job; the
    writer releases one after flushing the corresponding chunk in
    order. Total in-flight chunks (queued jobs + in-progress at
    workers + pending in the reorder map) is now hard-capped at
    `4 * threads`, with the writer in lockstep with the actual disk
    write rather than ahead of it.

  - An `Arc<AtomicBool> cancel` flag that workers set on AEAD failure.
    Workers check it at the top of their loop and drain remaining
    queued jobs without doing AEAD work. The reader checks it before
    each new chunk, so a tampered chunk causes the reader to stop
    within the in-flight window rather than after EOF.

The reader uses `permit_rx.recv_timeout(50ms)` rather than a blocking
`recv` so it can poll the cancel flag even when the rest of the
pipeline has quiesced. Without this, a 3-way deadlock is possible:
worker errors after all permits are out, the writer is blocked on a
missing-counter chunk that will never arrive, the other workers are
idle on `jobs_rx`, and the reader is blocked on `permit_rx`. The
50 ms wakeup is well below typical user-perceptible latency and only
runs when the pipeline is otherwise idle, so its cost is negligible.

While rewriting I also collapsed `encrypt_parallel`/`decrypt_parallel`
onto a shared `run_pipeline` helper parameterised by an `is_encrypt`
bool — the two functions previously duplicated ~150 lines of channel
plumbing for a one-line difference (`encrypt_in_place` vs
`decrypt_in_place`). Same for the reorder writer: a single
`ordered_writer` now returns `(OutSink, u64)`, and encrypt simply
ignores the byte count (decrypt uses it for the length cross-check).
Removed the stale "wrapping_add" on the in-order counter — wrapping
here would mask a real bug since `bump_counter` already rejects
overflow upstream — and corrected the per-thread memory estimate in
the module-level doc to match the new bounded model.

The job-channel capacity (`channel_capacity = 2 * threads`) is left
unchanged. The new permit cap (`4 * threads`) is deliberately larger
so out-of-order completion has slack; if the gap is ever exhausted
the only consequence is reader backpressure, never unbounded growth.

Test plan:
  - `cargo test` still passes the full 28-test integration suite,
    including `parallel_and_serial_outputs_round_trip` (proves the
    refactored unified pipeline produces bit-identical output to the
    serial path) and `rejects_tampered_ciphertext` (still surfaces
    the AEAD error, now via the cancel path).
  - Manual fail-fast probe: 200 MiB random plaintext, encrypt with
    `-j 8`, flip a byte at offset 2000 (inside chunk 0), decrypt
    with `-j 8`. Errors in ~2 ms, vs ~28 ms for a clean decrypt of
    the same file — confirming the reader stopped within the
    in-flight window rather than draining the whole input.
  - The `ordered_writer` cancel deadlock case is hit organically by
    the same probe: chunk 0 fails authentication, no further
    counter-0 chunk ever arrives, but the reader exits via the
    50 ms cancel poll and the rest of the pipeline drains.

Refs: external review (P2 / Gemini #1, Gemini #2, GLM51 #2/#7/#8).

2026-05-02 21:29:08 +02:00

ddidderr

75afadb1ec

feat!: multi-threaded pipeline + length-committed/random-access decrypt

Completes the two follow-ups deferred from the v0.10 format/secrets
work: multi-threaded AEAD encrypt/decrypt and a length-committed file
format that enables random-access decryption.

# Format change (file format v2)

Bumps the on-disk header version to 2 and introduces a flag bit
(`FLAG_LENGTH_COMMITTED`, bit 0). When set, an authenticated `u64 LE`
plaintext length is appended to the header after the nonce prefix. v1
files still decrypt unchanged. v2 readers reject unknown flag bits.

The flag is set automatically when the input is a regular file (we
stat the open FD to avoid TOCTOU). Stdin/pipes/FIFOs encrypt as before
with the flag clear. Sequential decrypt cross-checks the produced byte
count against the committed length as defense in depth (the AEAD
already authenticates the value via header AAD, but failing before we
rename the temp file into place is preferable to failing after).

# Random-access decrypt

`fcry -d -i FILE --offset N --length L` seeks directly to the chunk(s)
covering `[N, N+L)` and decrypts only those, without scanning the
predecessors. Requires a seekable file whose header has the
length-committed flag — stdin/pipe-encrypted files cannot use this
path and the CLI rejects it with a clear error.

The chunk layout is fully determined by `chunk_size` and the committed
total length (last chunk's plaintext is
`total - (n_chunks-1)*chunk_size`; its ciphertext length is
`last_pt + 16`). Each chunk's nonce is
`make_nonce(prefix, chunk_index, is_last_chunk)` which matches what
sequential encrypt produced, so plaintext slices come out
bit-identical to a full sequential decrypt.

# Multi-threaded pipeline

New `src/pipeline.rs` implements:

  reader thread → bounded jobs channel → N AEAD workers
                → bounded results channel → writer thread

The reader stays serial (it owns the input handle and uses lookahead
to detect the last chunk). Workers parallelize the AEAD step (each
chunk is independent under STREAM). The writer holds a
`BTreeMap<u32, Vec<u8>>` reorder buffer and only flushes in counter
order. Commit is deferred to the main thread, so a failure anywhere —
reader I/O, AEAD auth, writer I/O — drops `OutSink` without renaming
the temp file into place. The
`atomic_output_no_stale_tmp_on_failure` integration test still
passes.

Channel and reorder capacities scale with worker count (`2*threads`);
peak memory is roughly `chunk_size * 4 * threads`. With 1 MiB chunks
and 8 cores that's ~32 MiB, which we accept.

Default thread count is `std::thread::available_parallelism()`;
override with `-j/--threads N`. `-j 1` keeps the original serial path.
Stdin/stdout streaming works under the parallel path because `Stdin`
(unlocked) is `Send` — only `StdinLock` isn't, so the boxed reader
wraps `Stdin` directly in a `BufReader`.

Adds `crossbeam-channel = "0.5"` for bounded MPMC. The cipher
(`XChaCha20Poly1305`) and the header AAD are shared across workers via
`Arc`; the AEAD's internal key copy is zeroized on drop as before.

# CLI surface

  -j, --threads <N>     worker thread count (default: cores)
      --offset <BYTES>  random-access decrypt: slice start
      --length <BYTES>  random-access decrypt: slice length

`--offset`/`--length` require `--decrypt` and `--input-file` (clap
enforces; we also surface a clean runtime error if only one is
supplied).

# Test plan

* `cargo test` — 5 unit + 27 integration, all green.
* New integration coverage:
  - parallel roundtrip on multi-chunk inputs (`-j 4`)
  - parallel-encrypted ciphertext decrypted serially, and vice-versa
    (output bit-identical regardless of worker count)
  - parallel pipe stdin↔stdout (asserts flag byte is 0 for stdin
    inputs — no length committed without a known size)
  - file inputs auto-commit length (asserts version=2 and flags bit 0
    set in the raw header bytes)
  - random-access slices spanning chunk-aligned, mid-chunk,
    last-chunk, and full-file ranges
  - random-access rejects out-of-range and stdin-encrypted inputs,
    accepts zero-length
  - tampering the committed length byte fails AEAD authentication
  - hand-crafted v1 header still decodes (no flag bit set)
* `cargo clippy --all-targets -- -D warnings` clean.
* `cargo +nightly fmt` clean.

Removes `TODO.md` since both deferred items are now implemented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 20:33:00 +02:00

3 Commits