feat!: multi-threaded pipeline + length-committed/random-access decrypt

Completes the two follow-ups deferred from the v0.10 format/secrets
work: multi-threaded AEAD encrypt/decrypt and a length-committed file
format that enables random-access decryption.

# Format change (file format v2)

Bumps the on-disk header version to 2 and introduces a flag bit
(`FLAG_LENGTH_COMMITTED`, bit 0). When set, an authenticated `u64 LE`
plaintext length is appended to the header after the nonce prefix. v1
files still decrypt unchanged. v2 readers reject unknown flag bits.

The flag is set automatically when the input is a regular file (we
stat the open FD to avoid TOCTOU). Stdin/pipes/FIFOs encrypt as before
with the flag clear. Sequential decrypt cross-checks the produced byte
count against the committed length as defense in depth (the AEAD
already authenticates the value via header AAD, but failing before we
rename the temp file into place is preferable to failing after).

# Random-access decrypt

`fcry -d -i FILE --offset N --length L` seeks directly to the chunk(s)
covering `[N, N+L)` and decrypts only those, without scanning the
predecessors. Requires a seekable file whose header has the
length-committed flag — stdin/pipe-encrypted files cannot use this
path and the CLI rejects it with a clear error.

The chunk layout is fully determined by `chunk_size` and the committed
total length (last chunk's plaintext is
`total - (n_chunks-1)*chunk_size`; its ciphertext length is
`last_pt + 16`). Each chunk's nonce is
`make_nonce(prefix, chunk_index, is_last_chunk)` which matches what
sequential encrypt produced, so plaintext slices come out
bit-identical to a full sequential decrypt.

# Multi-threaded pipeline

New `src/pipeline.rs` implements:

  reader thread → bounded jobs channel → N AEAD workers
                → bounded results channel → writer thread

The reader stays serial (it owns the input handle and uses lookahead
to detect the last chunk). Workers parallelize the AEAD step (each
chunk is independent under STREAM). The writer holds a
`BTreeMap<u32, Vec<u8>>` reorder buffer and only flushes in counter
order. Commit is deferred to the main thread, so a failure anywhere —
reader I/O, AEAD auth, writer I/O — drops `OutSink` without renaming
the temp file into place. The
`atomic_output_no_stale_tmp_on_failure` integration test still
passes.

Channel and reorder capacities scale with worker count (`2*threads`);
peak memory is roughly `chunk_size * 4 * threads`. With 1 MiB chunks
and 8 cores that's ~32 MiB, which we accept.

Default thread count is `std::thread::available_parallelism()`;
override with `-j/--threads N`. `-j 1` keeps the original serial path.
Stdin/stdout streaming works under the parallel path because `Stdin`
(unlocked) is `Send` — only `StdinLock` isn't, so the boxed reader
wraps `Stdin` directly in a `BufReader`.

Adds `crossbeam-channel = "0.5"` for bounded MPMC. The cipher
(`XChaCha20Poly1305`) and the header AAD are shared across workers via
`Arc`; the AEAD's internal key copy is zeroized on drop as before.

# CLI surface

  -j, --threads <N>     worker thread count (default: cores)
      --offset <BYTES>  random-access decrypt: slice start
      --length <BYTES>  random-access decrypt: slice length

`--offset`/`--length` require `--decrypt` and `--input-file` (clap
enforces; we also surface a clean runtime error if only one is
supplied).

# Test plan

* `cargo test` — 5 unit + 27 integration, all green.
* New integration coverage:
  - parallel roundtrip on multi-chunk inputs (`-j 4`)
  - parallel-encrypted ciphertext decrypted serially, and vice-versa
    (output bit-identical regardless of worker count)
  - parallel pipe stdin↔stdout (asserts flag byte is 0 for stdin
    inputs — no length committed without a known size)
  - file inputs auto-commit length (asserts version=2 and flags bit 0
    set in the raw header bytes)
  - random-access slices spanning chunk-aligned, mid-chunk,
    last-chunk, and full-file ranges
  - random-access rejects out-of-range and stdin-encrypted inputs,
    accepts zero-length
  - tampering the committed length byte fails AEAD authentication
  - hand-crafted v1 header still decodes (no flag bit set)
* `cargo clippy --all-targets -- -D warnings` clean.
* `cargo +nightly fmt` clean.

Removes `TODO.md` since both deferred items are now implemented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-02 20:33:00 +02:00
parent f72f9034f3
commit 75afadb1ec
10 changed files with 1095 additions and 51 deletions
+195 -22
View File
@@ -1,21 +1,24 @@
// SPDX-License-Identifier: GPL-3.0-only
use chacha20poly1305::{KeyInit, XChaCha20Poly1305, XNonce, aead::AeadInPlace};
use std::io::Write;
use std::fs::File;
use std::io::{BufReader, Read, Seek, SeekFrom, Write};
use std::sync::Arc;
use crate::error::*;
use crate::header::{AlgId, Header, KdfParams, NONCE_PREFIX_LEN, TAG_LEN};
use crate::header::{AlgId, FLAG_LENGTH_COMMITTED, Header, KdfParams, NONCE_PREFIX_LEN, TAG_LEN};
use crate::pipeline;
use crate::reader::{AheadReader, ReadInfoChunk};
use crate::secrets::{SecretBytes32, SecretVec};
use crate::utils::*;
/// XChaCha20Poly1305 nonce: 24 bytes total. STREAM splits the trailing 5 bytes
/// into a 4-byte big-endian counter and a 1-byte "last block" flag.
const NONCE_LEN: usize = 24;
const COUNTER_LEN: usize = 4;
pub(crate) const NONCE_LEN: usize = 24;
pub(crate) const COUNTER_LEN: usize = 4;
const _: () = assert!(NONCE_PREFIX_LEN + COUNTER_LEN + 1 == NONCE_LEN);
fn make_nonce(prefix: &[u8; NONCE_PREFIX_LEN], counter: u32, last: bool) -> XNonce {
pub(crate) fn make_nonce(prefix: &[u8; NONCE_PREFIX_LEN], counter: u32, last: bool) -> XNonce {
let mut n = [0u8; NONCE_LEN];
n[..NONCE_PREFIX_LEN].copy_from_slice(prefix);
n[NONCE_PREFIX_LEN..NONCE_PREFIX_LEN + COUNTER_LEN].copy_from_slice(&counter.to_be_bytes());
@@ -55,36 +58,72 @@ pub fn derive_key(
Ok(out)
}
/// Build the AEAD cipher from the protected key. The cipher holds an
/// unprotected copy of the key while alive; `chacha20poly1305` zeroizes that
/// copy on drop. Wrapping in `Arc` lets us share it across worker threads.
fn build_aead(key: &SecretBytes32) -> Arc<XChaCha20Poly1305> {
Arc::new(key.with_array(|key| XChaCha20Poly1305::new(key.into())))
}
/// Bump the per-chunk counter; surface a domain error on overflow rather than
/// panicking on debug or wrapping in release.
pub(crate) fn bump_counter(counter: u32) -> Result<u32, FcryError> {
counter
.checked_add(1)
.ok_or_else(|| FcryError::Format("STREAM counter overflow (input too large)".into()))
}
pub fn encrypt<S: AsRef<str>>(
input_file: Option<S>,
output_file: Option<S>,
key: &SecretBytes32,
chunk_size: u32,
kdf: KdfParams,
threads: usize,
) -> Result<(), FcryError> {
let chunk_sz = chunk_size as usize;
let mut f_plain = AheadReader::from(open_input(input_file)?, chunk_sz);
let input = open_input(input_file)?;
let plaintext_length = input.length;
let mut f_plain = AheadReader::from(input.reader, chunk_sz);
let mut f_encrypted = OutSink::open(output_file)?;
let mut nonce_prefix = [0u8; NONCE_PREFIX_LEN];
getrandom::fill(&mut nonce_prefix)?;
let flags = if plaintext_length.is_some() {
FLAG_LENGTH_COMMITTED
} else {
0
};
let header = Header {
alg: AlgId::XChaCha20Poly1305,
flags: 0,
flags,
chunk_size,
kdf,
nonce_prefix,
plaintext_length,
};
let aad = header.encode();
let aad = Arc::new(header.encode());
f_encrypted.write_all(&aad)?;
// The AEAD keeps its own unprotected key copy while the loop runs.
// chacha20poly1305 zeroizes that copy on drop.
let aead = key.with_array(|key| XChaCha20Poly1305::new(key.into()));
let aead = build_aead(key);
if threads > 1 {
return pipeline::encrypt_parallel(
f_plain,
f_encrypted,
aead,
aad,
nonce_prefix,
chunk_sz,
threads,
plaintext_length,
);
}
let mut buf = vec![0u8; chunk_sz];
let mut counter: u32 = 0;
let mut bytes_seen: u64 = 0;
loop {
match f_plain.read_ahead(&mut buf)? {
@@ -93,15 +132,15 @@ pub fn encrypt<S: AsRef<str>>(
aead.encrypt_in_place(&nonce, &aad, &mut buf)?;
f_encrypted.write_all(&buf)?;
buf.truncate(chunk_sz);
counter = counter.checked_add(1).ok_or_else(|| {
FcryError::Format("STREAM counter overflow (input too large)".into())
})?;
bytes_seen = bytes_seen.saturating_add(chunk_sz as u64);
counter = bump_counter(counter)?;
}
ReadInfoChunk::Last(n) => {
buf.truncate(n);
let nonce = make_nonce(&nonce_prefix, counter, true);
aead.encrypt_in_place(&nonce, &aad, &mut buf)?;
f_encrypted.write_all(&buf)?;
bytes_seen = bytes_seen.saturating_add(n as u64);
break;
}
ReadInfoChunk::Empty => {
@@ -116,6 +155,17 @@ pub fn encrypt<S: AsRef<str>>(
}
}
if let Some(committed) = plaintext_length
&& committed != bytes_seen
{
// Defense in depth: the input changed between stat and EOF. The
// committed length is part of the AEAD AAD, so any decrypter would
// also surface this, but we prefer to fail before publishing the file.
return Err(FcryError::Format(format!(
"input length changed during encryption: committed {committed}, read {bytes_seen}"
)));
}
f_encrypted.commit()?;
Ok(())
}
@@ -125,10 +175,11 @@ pub fn decrypt<S: AsRef<str>>(
output_file: Option<S>,
raw_key: Option<&SecretBytes32>,
passphrase: Option<&SecretVec>,
threads: usize,
) -> Result<(), FcryError> {
let mut reader = open_input(input_file)?;
let mut reader = open_input(input_file)?.reader;
let header = Header::read(&mut reader)?;
let aad = header.encode();
let aad = Arc::new(header.encode());
let key = derive_key(&header.kdf, raw_key, passphrase)?;
@@ -138,12 +189,24 @@ pub fn decrypt<S: AsRef<str>>(
let mut f_encrypted = AheadReader::from(reader, cipher_chunk);
let mut f_plain = OutSink::open(output_file)?;
// The AEAD keeps its own unprotected key copy while the loop runs.
// chacha20poly1305 zeroizes that copy on drop.
let aead = key.with_array(|key| XChaCha20Poly1305::new(key.into()));
let aead = build_aead(&key);
if threads > 1 {
return pipeline::decrypt_parallel(
f_encrypted,
f_plain,
aead,
aad,
header.nonce_prefix,
cipher_chunk,
threads,
header.plaintext_length,
);
}
let mut buf = vec![0u8; cipher_chunk];
let mut counter: u32 = 0;
let mut bytes_written: u64 = 0;
loop {
match f_encrypted.read_ahead(&mut buf)? {
@@ -151,16 +214,16 @@ pub fn decrypt<S: AsRef<str>>(
let nonce = make_nonce(&header.nonce_prefix, counter, false);
aead.decrypt_in_place(&nonce, &aad, &mut buf)?;
f_plain.write_all(&buf)?;
bytes_written = bytes_written.saturating_add(buf.len() as u64);
buf.resize(cipher_chunk, 0);
counter = counter
.checked_add(1)
.ok_or_else(|| FcryError::Format("STREAM counter overflow".into()))?;
counter = bump_counter(counter)?;
}
ReadInfoChunk::Last(n) => {
buf.truncate(n);
let nonce = make_nonce(&header.nonce_prefix, counter, true);
aead.decrypt_in_place(&nonce, &aad, &mut buf)?;
f_plain.write_all(&buf)?;
bytes_written = bytes_written.saturating_add(buf.len() as u64);
break;
}
ReadInfoChunk::Empty => {
@@ -171,6 +234,116 @@ pub fn decrypt<S: AsRef<str>>(
}
}
if let Some(committed) = header.plaintext_length
&& committed != bytes_written
{
return Err(FcryError::Format(format!(
"decrypted length {bytes_written} disagrees with committed {committed}"
)));
}
f_plain.commit()?;
Ok(())
}
/// Random-access decrypt of a byte range. Requires a seekable input file
/// whose header has `FLAG_LENGTH_COMMITTED` set, so we know exactly where
/// each ciphertext chunk lives and which chunk is the last (its nonce uses
/// the STREAM last-block flag).
pub fn decrypt_range<S: AsRef<str>>(
input_file: &str,
output_file: Option<S>,
raw_key: Option<&SecretBytes32>,
passphrase: Option<&SecretVec>,
offset: u64,
length: u64,
) -> Result<(), FcryError> {
let file = File::open(input_file)?;
let mut reader = BufReader::new(file);
let header = Header::read(&mut reader)?;
let aad = header.encode();
let header_len = aad.len() as u64;
let total = header.plaintext_length.ok_or_else(|| {
FcryError::Format(
"random-access decrypt requires a length-committed header (encrypt from a regular file)".into(),
)
})?;
let end = offset
.checked_add(length)
.ok_or_else(|| FcryError::Format("offset + length overflows u64".into()))?;
if end > total {
return Err(FcryError::Format(format!(
"range [{offset}, {end}) exceeds plaintext length {total}"
)));
}
let key = derive_key(&header.kdf, raw_key, passphrase)?;
let aead = build_aead(&key);
let chunk_sz = header.chunk_size as u64;
let cipher_chunk = chunk_sz + TAG_LEN as u64;
// Layout invariants:
// n_chunks = ceil(total / chunk_sz), but always ≥ 1 (the empty file
// still authenticates a single empty "last" chunk).
// last_idx = n_chunks - 1
// last_pt = total - last_idx * chunk_sz (in [0, chunk_sz])
let (n_chunks, last_pt) = if total == 0 {
(1u64, 0u64)
} else {
let n = total.div_ceil(chunk_sz);
let last = total - (n - 1) * chunk_sz;
(n, last)
};
let last_idx = n_chunks - 1;
let mut out = OutSink::open(output_file)?;
if length == 0 {
out.commit()?;
return Ok(());
}
let start_chunk = offset / chunk_sz;
let end_chunk = (end - 1) / chunk_sz;
// Reusable buffer sized to a full chunk + tag.
let mut buf = Vec::with_capacity(cipher_chunk as usize);
let mut file = reader.into_inner();
for i in start_chunk..=end_chunk {
let i_u32 =
u32::try_from(i).map_err(|_| FcryError::Format("chunk index exceeds u32".into()))?;
let is_last = i == last_idx;
let cipher_len = if is_last {
last_pt + TAG_LEN as u64
} else {
cipher_chunk
};
let cipher_len_usz =
usize::try_from(cipher_len).map_err(|_| FcryError::Format("chunk too big".into()))?;
let chunk_offset = header_len + i * cipher_chunk;
file.seek(SeekFrom::Start(chunk_offset))?;
buf.clear();
buf.resize(cipher_len_usz, 0);
file.read_exact(&mut buf)?;
let nonce = make_nonce(&header.nonce_prefix, i_u32, is_last);
aead.decrypt_in_place(&nonce, &aad, &mut buf)?;
// `buf` is now plaintext for this chunk. Compute the chunk's plaintext
// window in absolute bytes and intersect with the requested range.
let chunk_start = i * chunk_sz;
let chunk_end = chunk_start + buf.len() as u64;
let lo = offset.max(chunk_start) - chunk_start;
let hi = end.min(chunk_end) - chunk_start;
out.write_all(&buf[lo as usize..hi as usize])?;
}
out.commit()?;
Ok(())
}