Files
Toju/agents-docs/features/attachments.md
brogeby 47beed01ca docs: add cross-context feature docs for auth, presence, access-control, messaging, attachments
Fills the five highest-value gaps under agents-docs/features/ so the index covers
the system's main cross-context contracts. Each doc follows the feature-template
structure and the AGENTS_FEATURES.md contract, with honest TODOs where coverage
or behavior couldn't be confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 22:33:41 +02:00

197 lines
11 KiB
Markdown

# Attachments
> **Area:** attachments
> **Status:** Active
> **Last updated:** 2026-05-25
## Overview
Attachments are pure peer-to-peer in Toju. The signaling server never sees a file byte. A sender announces an attachment on the WebRTC chat data channel; a receiver requests it; the sender streams base64-encoded 64 KiB chunks back; the receiver reassembles and (on Electron) writes the result to disk under a per-conversation folder. If the original sender goes offline mid-transfer, the receiver can re-request from another peer that previously announced the same attachment. There is no inventory protocol, no integrity signature, and no server-side fallback — attachments live entirely on the participants' machines.
This area is the closest sibling of [voice-signaling](./voice-signaling.md): both are P2P protocols that ride the same RTCPeerConnection. The chat events that drive attachments are members of the `ChatEvent` union; they share the data channel with chat messages but are conceptually distinct.
## Responsibilities
- Define the file-transfer envelope set (announce / request / chunk / cancel / not-found) and its sequencing rules.
- Maintain per-transfer state on both sides — chunk index, in-flight chunk, retry/failover bookkeeping.
- Decide whether to auto-download (size + media-type heuristic).
- Decide where to persist (Electron disk vs browser memory).
- Estimate transfer speed via EWMA so the UI can render a progress bar that doesn't jitter.
- Pick a failover peer when the current sender disappears.
This area does **not** own:
- The chat message that references the attachment → [messaging](./messaging.md).
- The peer connection or data channel itself → [voice-signaling](./voice-signaling.md).
- The IPC channels used to read / write the file on Electron → [ipc-bridge](./ipc-bridge.md).
- Permission to upload — there is no formal upload gate today; access-control's `writeMessages` is the proxy. See [access-control](./access-control.md).
## Key concepts
- **Attachment** — a file announced and referenced by a chat message. Persisted independently of the message body.
- **Transfer** — the per-receiver state for a single in-flight attachment.
- **Bucket** — storage subfolder: `image | video | audio | files`. Determined by MIME type.
- **Tried-peer set** — the set of peers a receiver has already attempted for a given `${messageId}:${fileId}`; used to drive failover without re-trying the same peer in a loop.
- **`uploaderPeerId`** — the original announcer; the receiver prefers it over the tried-peer set when (re-)issuing a `file-request`.
---
## Protocol
The five events live in the `ChatEvent` union (`toju-app/src/app/shared-kernel/chat-events.ts`) and ride the WebRTC `chat` data channel. They do **not** flow through the WebSocket signaling server.
- `file-announce` — sender announces an attachment alongside a chat message. Carries `messageId`, `fileId`, `name`, `size`, `mimeType`, optional preview metadata.
- `file-request` — receiver requests the attachment from a specific peer.
- `file-chunk` — sender streams `index`, base64-encoded chunk payload, and `total` chunk count.
- `file-cancel` — either side aborts the in-flight transfer.
- `file-not-found` — sender responds when asked for an unknown `fileId`.
### Constants
Defined in the attachment domain (`toju-app/src/app/domains/attachment/`):
- `P2P_BASE64_CHUNK_SIZE_BYTES = 64 * 1024` — re-exported as `FILE_CHUNK_SIZE_BYTES`. Shared with the avatar P2P sync path.
- `MAX_AUTO_SAVE_SIZE_BYTES = 10 * 1024 * 1024` — files at or under 10 MiB are auto-downloaded on receipt.
- `MAX_BROWSER_INLINE_MEDIA_SIZE_BYTES = 50 * 1024 * 1024` — browser-mode cap on inlined media.
- **EWMA weights** — previous-weight `0.7`, current-weight `0.3` for transfer-rate smoothing.
- **Data-channel water marks** — `highWaterMark = 4 MiB`, `lowWaterMark = 1 MiB` for backpressure pacing.
### Flow
1. Sender computes attachment metadata and emits `file-announce` referencing the chat message.
2. Receiver opens a transfer state. Auto-download triggers if `size ≤ MAX_AUTO_SAVE_SIZE_BYTES` and the MIME type is in the allow-list for the bucket. Larger files require an explicit user click.
3. Receiver sends `file-request` to `uploaderPeerId`.
4. Sender streams `file-chunk` events sequentially. **Exactly one chunk is in flight per receiver at a time** — the sender awaits the per-chunk write/ack before queueing the next one. On Electron the receiver writes each chunk to disk; the protocol requires `index === receivedCount` for the next chunk or the transfer aborts.
5. Receiver reassembles. On Electron, the file lands at:
- `{appData}/server/{room}/{bucket}/{id}{.ext}` for server-channel attachments.
- `{appData}/direct-messages/{conv}/{bucket}/{id}{.ext}` for DM attachments.
- Browser mode keeps the file as a Blob in memory — lost on reload.
6. Either side may `file-cancel`; the sender returns `file-not-found` if the requested `fileId` is unknown.
### Failover
- Receiver-driven. No inventory protocol.
- Sequential — tries one peer at a time.
- The tried-peer set is keyed by `${messageId}:${fileId}`.
- `uploaderPeerId` is always preferred when reachable; the tried-peer set ensures it isn't re-attempted in a busy loop after a failure.
- If every available peer is in the tried set, the transfer ends in a not-found state and surfaces a UI prompt.
---
## Storage
### Electron
- `AttachmentEntity` — TypeORM row in the per-user local database. Carries `id`, `messageId`, `roomId` / `conversationId`, `name`, `size`, `mimeType`, `bucket`, `relativePath`, `createdAt`.
- CQRS commands — `save-attachment`, `delete-attachments-for-message`.
- CQRS queries — `get-all-attachments`, `get-attachments-for-message`.
- Filesystem IPC — `read-file-chunk`, `get-file-size`, `write-file`, `append-file`, `get-file-url`, `file-exists`, `delete-file`, `ensure-dir`, `get-app-data-path`.
The renderer never touches Node.js filesystem APIs directly; every read/write is brokered through [ipc-bridge](./ipc-bridge.md).
### Browser
When the desktop shell is not present, attachments stay in-memory as Blob URLs. Reloading the renderer loses them; this is documented behavior, not a bug.
---
## Auto-download heuristic
- Any file with `size ≤ 10 MiB` and a media MIME type (`image/*`, `video/*`, `audio/*`) is auto-downloaded on receipt so the chat UI can render it inline.
- Files above the cap or in the `files` bucket require an explicit click. The chat UI shows a "Download" affordance with the file size.
---
## Speed estimation (EWMA)
Transfer rate is exposed to the UI via an exponentially-weighted moving average:
```
rate_t = 0.7 · rate_{t-1} + 0.3 · instantaneous_t
```
Smooth enough for a stable progress display; responsive enough to surface a stalled transfer within a few seconds.
---
## Business rules and invariants
- Attachments are **pure P2P** — the signaling server never sees an attachment byte.
- **One chunk in flight per sender → receiver** (`await` per chunk). No parallelism within a single transfer.
- **Sequential chunk indices on Electron disk receive** — `index === receivedCount` is enforced; mismatches abort.
- **`PeerDeliveryService` is not on the attachment path.** Attachments use `RealtimeSessionFacade.broadcastMessage` / `sendToPeer` / `sendToPeerBuffered` directly.
- **Browser mode loses everything on reload** — no IndexedDB persistence today for attachments.
- **No integrity / signature check** on chunks; no encryption at rest beyond OS file permissions.
- **Failover is receiver-driven** and tried-peer-set deduplicated.
---
## Technical implementation
### Product client
- Domain — `toju-app/src/app/domains/attachment/`: manager, transfer state, persistence selection.
- Contracts — `toju-app/src/app/shared-kernel/attachment-contracts.ts`, `chat-events.ts` (the five envelope types).
- Realtime send paths — `RealtimeSessionFacade.broadcastMessage` / `sendToPeer` / `sendToPeerBuffered` in the realtime infrastructure tree.
### Electron
- Entity — `AttachmentEntity` in `electron/entities/`.
- CQRS handlers — under `electron/src/cqrs/` (or equivalent) for `save-attachment`, `delete-attachments-for-message`, `get-all-attachments`, `get-attachments-for-message`.
- Filesystem IPC handlers — `electron/ipc/`: `read-file-chunk`, `get-file-size`, `write-file`, `append-file`, `get-file-url`, `file-exists`, `delete-file`, `ensure-dir`, `get-app-data-path`.
### Key types
- `AttachmentEntity` — local persistence row.
- `FileChunkEvent`, `FileAnnounceEvent`, `FileRequestEvent`, `FileCancelEvent`, `FileNotFoundEvent` — member shapes of the `ChatEvent` union.
---
## Testing
- TODO: no dedicated `*.spec.ts` files under `toju-app/src/app/domains/attachment/` at time of writing.
- E2E: `e2e/tests/chat/chat-message-features.spec.ts` includes `test('syncs image and file attachments between users', ...)` which covers happy-path attachment sync.
- TODO: no E2E coverage for multi-peer failover.
- TODO: no E2E coverage for `file-cancel`.
---
## Security considerations
- **No integrity signature.** A malicious sender can corrupt a chunk; the receiver assembles whatever arrives.
- **No encryption at rest** beyond OS-level file permissions on the per-user app-data folder.
- **No MIME-type sanitation.** The receiver trusts the announced `mimeType` for bucket routing; a misleading MIME does not change the on-disk contents but does affect inline rendering. Browser-side renderers must defend against this.
- **No size cap server-side.** Caps are receiver-side and advisory: `MAX_AUTO_SAVE_SIZE_BYTES` for auto-download, `MAX_BROWSER_INLINE_MEDIA_SIZE_BYTES` for in-memory media. A sender can announce arbitrarily large files; the receiver simply refuses them.
- **Receivers expose disk write paths** indirectly: a misbehaving peer cannot escape `{appData}/server/...` or `{appData}/direct-messages/...` because the relative path is computed by the receiver, not transmitted by the sender — but this property must be preserved in any future protocol change.
---
## Performance considerations
- **Base64 overhead.** ~33 % inflation on the wire; a 64 KiB binary chunk is ~86 KiB on the wire.
- **Single chunk in flight** per (sender, receiver) — caps single-receiver throughput at one round-trip per chunk.
- **Data-channel water marks** (4 MiB high, 1 MiB low) provide back-pressure pacing without tuning per-NIC.
- **No FEC, no parallel chunks, no resumption across browser reloads.**
---
## Known issues and limitations
- **No dedicated unit specs** for the attachment domain.
- **No resume across browser reloads** (Electron writes to disk and survives; browser does not).
- **No checksum / signed integrity** on chunks.
- **No encryption at rest** beyond OS file permissions.
- **No server-side fallback** if every peer is offline — attachments are unreachable until at least one peer with the file returns.
---
## Related features
- **[messaging](./messaging.md)** — chat messages reference attachments; attachments are persisted separately from message bodies.
- **[voice-signaling](./voice-signaling.md)** — establishes the data channel that attachments ride on.
- **[ipc-bridge](./ipc-bridge.md)** — exposes the filesystem and CQRS APIs the Electron persistence path uses.
- **[websocket-envelopes](./websocket-envelopes.md)** — for context only; attachments do not flow through the signaling server.
## Changelog
| Date | Change |
|------|--------|
| 2026-05-25 | Initial documentation |