Files
upl/PLAN.md
T
ddidderr 4527e23b8b docs: add resumable upload plan
Document the minimal design for a personal large-file upload app that can
resume after browser, network, or server interruptions. The plan keeps the
first version intentionally small: one Rust server, one static browser UI,
filesystem-backed upload metadata, raw chunk uploads, and no database or
third-party resumable upload protocol.

The deployment notes include nginx as the external TLS and access-control
layer, with the Rust server bound behind it and upload-specific proxy settings
called out.

Test Plan:
- git diff --cached --check

Refs: user request
2026-05-30 16:46:27 +02:00

356 lines
8.4 KiB
Markdown

# Resumable Upload Plan
## Goal
Build a small personal web app for uploading large files without losing
progress when the network drops, the tab closes, or the Rust server restarts.
The final deployment is:
```text
browser -> nginx -> upl Rust server -> local filesystem
```
The program should stay simple:
- one Rust server binary
- one static browser UI
- no database server
- no frontend framework
- no Tus/Uppy/Resumable.js for the first version
- local filesystem metadata as the source of truth
## Top-Level Design
### Browser
The browser owns file selection and chunk scheduling.
- Let the user pick one file.
- Slice it into fixed-size chunks with `Blob.slice()`.
- Upload a few chunks concurrently.
- Retry failed chunks with exponential backoff.
- Persist pending upload state in IndexedDB.
- Use the File System Access API when available so the same local file can be
reopened after a browser restart without making the user browse to it again.
### nginx
nginx owns TLS, external access control, and reverse proxying.
- Bind the Rust server to localhost only.
- Terminate HTTPS in nginx.
- Protect the app because it is a personal upload tool.
- Forward upload API requests to the Rust server without buffering whole request
bodies before they reach Rust.
### Rust Server
The Rust server owns upload identity, chunk validation, progress reporting, and
final assembly.
- Serve the static page.
- Create upload records.
- Accept raw binary chunk bodies.
- Store chunks on disk as they arrive.
- Report which chunks already exist.
- Assemble chunks into the final file once all chunks are present.
## Storage Layout
```text
data/
staging/
<upload_id>/
meta.json
chunks/
000000.part
000001.part
000002.part
complete/
<safe_file_name>
```
`meta.json` is the durable upload record:
```json
{
"id": "random-server-id",
"original_name": "movie.mkv",
"safe_name": "movie.mkv",
"size": 1234567890,
"last_modified": 1760000000000,
"chunk_size": 16777216,
"total_chunks": 74,
"created_at": "2026-05-30T16:00:00Z"
}
```
The server should generate `upload_id`. The browser should not invent the
primary upload identity from file metadata. File name, size, and modified time
are useful for display and sanity checks, but they are not unique enough to be
the durable server identity.
## HTTP API
Keep the API small and boring.
```text
GET /
POST /api/uploads
GET /api/uploads/:id
PUT /api/uploads/:id/chunks/:index
POST /api/uploads/:id/complete
```
### Create Upload
`POST /api/uploads`
Request:
```json
{
"name": "movie.mkv",
"size": 1234567890,
"last_modified": 1760000000000
}
```
Response:
```json
{
"upload_id": "random-server-id",
"chunk_size": 16777216,
"total_chunks": 74,
"completed_chunks": []
}
```
Start with a fixed chunk size of 16 MiB. This keeps request count reasonable
while making failed chunks cheap enough to retry.
### Query Progress
`GET /api/uploads/:id`
Response:
```json
{
"upload_id": "random-server-id",
"name": "movie.mkv",
"size": 1234567890,
"chunk_size": 16777216,
"total_chunks": 74,
"completed_chunks": [0, 1, 2, 5]
}
```
The server can compute `completed_chunks` by scanning the chunk directory and
checking file lengths. This avoids needing a database.
### Upload Chunk
`PUT /api/uploads/:id/chunks/:index`
Use a raw request body:
```http
Content-Type: application/octet-stream
```
Do not use multipart form uploads for chunks in the minimal version. Raw bytes
make the Rust handler simpler and avoid multipart parsing.
Server rules:
- reject unknown upload IDs
- reject out-of-range chunk indexes
- reject chunks with the wrong length
- allow the final chunk to be shorter than `chunk_size`
- write to `000123.part.tmp` first
- rename the temp file to `000123.part` only after the write succeeds
- make duplicate chunk uploads idempotent when the existing chunk has the
expected length
### Complete Upload
`POST /api/uploads/:id/complete`
The server should:
1. Load `meta.json`.
2. Verify every expected chunk exists.
3. Verify every chunk has the expected length.
4. Concatenate chunks in order into a temp final file.
5. Rename the temp final file into `data/complete/`.
6. Return the final file path or download URL.
The server should not delete staging data until assembly succeeds.
## Resume Flow
### First Upload
1. User selects a file.
2. Browser calls `POST /api/uploads`.
3. Browser stores the returned `upload_id` and file handle in IndexedDB.
4. Browser uploads missing chunks with a small concurrency pool.
5. Browser calls `/complete` when all chunks are uploaded.
### After Interruption
1. Browser loads pending upload records from IndexedDB.
2. Browser calls `GET /api/uploads/:id`.
3. Browser asks for read permission on the saved file handle.
4. Browser compares server `completed_chunks` with total chunks.
5. Browser uploads only missing chunks.
6. Browser calls `/complete`.
The server is authoritative. Browser state helps find the file again, but
server state decides what has actually been uploaded.
## Browser State
IndexedDB record:
```json
{
"upload_id": "random-server-id",
"name": "movie.mkv",
"size": 1234567890,
"last_modified": 1760000000000,
"chunk_size": 16777216,
"total_chunks": 74,
"file_handle": "<FileSystemFileHandle>",
"updated_at": "2026-05-30T16:00:00Z"
}
```
If `showOpenFilePicker()` is unavailable, fall back to a normal
`<input type="file">`. That fallback can still resume server-side progress, but
the user must reselect the same file after a page reload.
## Upload Scheduler
Start with these defaults:
```text
chunk size: 16 MiB
concurrency: 3
max retries per chunk: 5
```
The scheduler should support:
- pause with `AbortController`
- resume by rebuilding the missing chunk list
- retry with exponential backoff
- visible progress based on verified completed chunks
Progress should be based on chunks the server has accepted, not bytes merely
sent by the browser.
## nginx Requirements
Example shape:
```nginx
server {
listen 443 ssl;
server_name uploads.example.com;
client_max_body_size 64m;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
proxy_request_buffering off;
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
}
}
```
Notes:
- `client_max_body_size` only needs to exceed the maximum single chunk size, not
the full file size.
- `proxy_request_buffering off` lets the Rust server receive upload bodies
directly instead of waiting for nginx to buffer the whole chunk first.
- Long timeouts are useful for slow links and large chunks.
- Add HTTP basic auth, an IP allowlist, VPN-only access, or another protection
layer before exposing this publicly.
## Rust Implementation Shape
Suggested crates:
- `axum` for HTTP routing
- `tokio` for async runtime and filesystem operations
- `serde` and `serde_json` for metadata
- `uuid` or `nanoid` for upload IDs
- `tower-http` for static file serving
Suggested modules:
```text
src/
main.rs
api.rs
storage.rs
model.rs
static_files.rs
```
`storage.rs` should be the only module that knows the on-disk layout.
## Validation
Manual checks for the MVP:
- upload a small file in one pass
- upload a file larger than one chunk
- kill the browser tab mid-upload and resume
- restart the Rust server mid-upload and resume
- interrupt the network and resume
- retry a duplicate chunk and confirm it is accepted idempotently
- attempt an invalid chunk index and confirm it is rejected
- attempt a wrong-size non-final chunk and confirm it is rejected
- complete an upload and compare the final file with the source file
Useful checksum command:
```sh
sha256sum source-file data/complete/uploaded-file
```
## Milestones
1. Serve a static page from Rust.
2. Add upload creation and on-disk metadata.
3. Add raw chunk upload and chunk validation.
4. Add progress query from existing chunk files.
5. Add browser chunk slicing and concurrency.
6. Add IndexedDB state.
7. Add File System Access API resume.
8. Add completion assembly.
9. Put the server behind nginx and verify resume still works.
## Explicit Non-Goals For The First Version
- multiple-user accounts
- cloud object storage
- encryption at rest
- background service worker upload
- content-addressed deduplication
- full-file hashing before upload
- Tus protocol compatibility
- drag-and-drop polish
- mobile browser support
These can be added later if they become useful, but they are unnecessary for a
correct personal uploader.