diff --git a/PLAN.md b/PLAN.md new file mode 100644 index 0000000..bc61e68 --- /dev/null +++ b/PLAN.md @@ -0,0 +1,355 @@ +# Resumable Upload Plan + +## Goal + +Build a small personal web app for uploading large files without losing +progress when the network drops, the tab closes, or the Rust server restarts. + +The final deployment is: + +```text +browser -> nginx -> upl Rust server -> local filesystem +``` + +The program should stay simple: + +- one Rust server binary +- one static browser UI +- no database server +- no frontend framework +- no Tus/Uppy/Resumable.js for the first version +- local filesystem metadata as the source of truth + +## Top-Level Design + +### Browser + +The browser owns file selection and chunk scheduling. + +- Let the user pick one file. +- Slice it into fixed-size chunks with `Blob.slice()`. +- Upload a few chunks concurrently. +- Retry failed chunks with exponential backoff. +- Persist pending upload state in IndexedDB. +- Use the File System Access API when available so the same local file can be + reopened after a browser restart without making the user browse to it again. + +### nginx + +nginx owns TLS, external access control, and reverse proxying. + +- Bind the Rust server to localhost only. +- Terminate HTTPS in nginx. +- Protect the app because it is a personal upload tool. +- Forward upload API requests to the Rust server without buffering whole request + bodies before they reach Rust. + +### Rust Server + +The Rust server owns upload identity, chunk validation, progress reporting, and +final assembly. + +- Serve the static page. +- Create upload records. +- Accept raw binary chunk bodies. +- Store chunks on disk as they arrive. +- Report which chunks already exist. +- Assemble chunks into the final file once all chunks are present. + +## Storage Layout + +```text +data/ + staging/ + / + meta.json + chunks/ + 000000.part + 000001.part + 000002.part + complete/ + +``` + +`meta.json` is the durable upload record: + +```json +{ + "id": "random-server-id", + "original_name": "movie.mkv", + "safe_name": "movie.mkv", + "size": 1234567890, + "last_modified": 1760000000000, + "chunk_size": 16777216, + "total_chunks": 74, + "created_at": "2026-05-30T16:00:00Z" +} +``` + +The server should generate `upload_id`. The browser should not invent the +primary upload identity from file metadata. File name, size, and modified time +are useful for display and sanity checks, but they are not unique enough to be +the durable server identity. + +## HTTP API + +Keep the API small and boring. + +```text +GET / +POST /api/uploads +GET /api/uploads/:id +PUT /api/uploads/:id/chunks/:index +POST /api/uploads/:id/complete +``` + +### Create Upload + +`POST /api/uploads` + +Request: + +```json +{ + "name": "movie.mkv", + "size": 1234567890, + "last_modified": 1760000000000 +} +``` + +Response: + +```json +{ + "upload_id": "random-server-id", + "chunk_size": 16777216, + "total_chunks": 74, + "completed_chunks": [] +} +``` + +Start with a fixed chunk size of 16 MiB. This keeps request count reasonable +while making failed chunks cheap enough to retry. + +### Query Progress + +`GET /api/uploads/:id` + +Response: + +```json +{ + "upload_id": "random-server-id", + "name": "movie.mkv", + "size": 1234567890, + "chunk_size": 16777216, + "total_chunks": 74, + "completed_chunks": [0, 1, 2, 5] +} +``` + +The server can compute `completed_chunks` by scanning the chunk directory and +checking file lengths. This avoids needing a database. + +### Upload Chunk + +`PUT /api/uploads/:id/chunks/:index` + +Use a raw request body: + +```http +Content-Type: application/octet-stream +``` + +Do not use multipart form uploads for chunks in the minimal version. Raw bytes +make the Rust handler simpler and avoid multipart parsing. + +Server rules: + +- reject unknown upload IDs +- reject out-of-range chunk indexes +- reject chunks with the wrong length +- allow the final chunk to be shorter than `chunk_size` +- write to `000123.part.tmp` first +- rename the temp file to `000123.part` only after the write succeeds +- make duplicate chunk uploads idempotent when the existing chunk has the + expected length + +### Complete Upload + +`POST /api/uploads/:id/complete` + +The server should: + +1. Load `meta.json`. +2. Verify every expected chunk exists. +3. Verify every chunk has the expected length. +4. Concatenate chunks in order into a temp final file. +5. Rename the temp final file into `data/complete/`. +6. Return the final file path or download URL. + +The server should not delete staging data until assembly succeeds. + +## Resume Flow + +### First Upload + +1. User selects a file. +2. Browser calls `POST /api/uploads`. +3. Browser stores the returned `upload_id` and file handle in IndexedDB. +4. Browser uploads missing chunks with a small concurrency pool. +5. Browser calls `/complete` when all chunks are uploaded. + +### After Interruption + +1. Browser loads pending upload records from IndexedDB. +2. Browser calls `GET /api/uploads/:id`. +3. Browser asks for read permission on the saved file handle. +4. Browser compares server `completed_chunks` with total chunks. +5. Browser uploads only missing chunks. +6. Browser calls `/complete`. + +The server is authoritative. Browser state helps find the file again, but +server state decides what has actually been uploaded. + +## Browser State + +IndexedDB record: + +```json +{ + "upload_id": "random-server-id", + "name": "movie.mkv", + "size": 1234567890, + "last_modified": 1760000000000, + "chunk_size": 16777216, + "total_chunks": 74, + "file_handle": "", + "updated_at": "2026-05-30T16:00:00Z" +} +``` + +If `showOpenFilePicker()` is unavailable, fall back to a normal +``. That fallback can still resume server-side progress, but +the user must reselect the same file after a page reload. + +## Upload Scheduler + +Start with these defaults: + +```text +chunk size: 16 MiB +concurrency: 3 +max retries per chunk: 5 +``` + +The scheduler should support: + +- pause with `AbortController` +- resume by rebuilding the missing chunk list +- retry with exponential backoff +- visible progress based on verified completed chunks + +Progress should be based on chunks the server has accepted, not bytes merely +sent by the browser. + +## nginx Requirements + +Example shape: + +```nginx +server { + listen 443 ssl; + server_name uploads.example.com; + + client_max_body_size 64m; + + location / { + proxy_pass http://127.0.0.1:3000; + proxy_http_version 1.1; + proxy_request_buffering off; + proxy_read_timeout 3600s; + proxy_send_timeout 3600s; + } +} +``` + +Notes: + +- `client_max_body_size` only needs to exceed the maximum single chunk size, not + the full file size. +- `proxy_request_buffering off` lets the Rust server receive upload bodies + directly instead of waiting for nginx to buffer the whole chunk first. +- Long timeouts are useful for slow links and large chunks. +- Add HTTP basic auth, an IP allowlist, VPN-only access, or another protection + layer before exposing this publicly. + +## Rust Implementation Shape + +Suggested crates: + +- `axum` for HTTP routing +- `tokio` for async runtime and filesystem operations +- `serde` and `serde_json` for metadata +- `uuid` or `nanoid` for upload IDs +- `tower-http` for static file serving + +Suggested modules: + +```text +src/ + main.rs + api.rs + storage.rs + model.rs + static_files.rs +``` + +`storage.rs` should be the only module that knows the on-disk layout. + +## Validation + +Manual checks for the MVP: + +- upload a small file in one pass +- upload a file larger than one chunk +- kill the browser tab mid-upload and resume +- restart the Rust server mid-upload and resume +- interrupt the network and resume +- retry a duplicate chunk and confirm it is accepted idempotently +- attempt an invalid chunk index and confirm it is rejected +- attempt a wrong-size non-final chunk and confirm it is rejected +- complete an upload and compare the final file with the source file + +Useful checksum command: + +```sh +sha256sum source-file data/complete/uploaded-file +``` + +## Milestones + +1. Serve a static page from Rust. +2. Add upload creation and on-disk metadata. +3. Add raw chunk upload and chunk validation. +4. Add progress query from existing chunk files. +5. Add browser chunk slicing and concurrency. +6. Add IndexedDB state. +7. Add File System Access API resume. +8. Add completion assembly. +9. Put the server behind nginx and verify resume still works. + +## Explicit Non-Goals For The First Version + +- multiple-user accounts +- cloud object storage +- encryption at rest +- background service worker upload +- content-addressed deduplication +- full-file hashing before upload +- Tus protocol compatibility +- drag-and-drop polish +- mobile browser support + +These can be added later if they become useful, but they are unnecessary for a +correct personal uploader.