Move toju-app into own its folder

This commit is contained in:
2026-03-29 23:30:37 +02:00
parent 0467a7b612
commit 8162e0444a
287 changed files with 42 additions and 34 deletions

View File

@@ -0,0 +1,322 @@
# Realtime Infrastructure
Low-level WebRTC and WebSocket plumbing that the rest of the app sits on top of. Nothing in here knows about Angular components, NgRx, or domain logic. It exposes observables, signals, and callbacks that higher layers (facades, effects, components) consume.
## Module map
```
realtime/
├── realtime-session.service.ts Composition root (WebRTCService)
├── realtime.types.ts PeerData, credentials, tracker types
├── realtime.constants.ts ICE servers, signal types, bitrates, intervals
├── signaling/ WebSocket layer
│ ├── signaling.manager.ts One WebSocket per signaling URL
│ ├── signaling-transport-handler.ts Routes messages to the right socket
│ ├── server-signaling-coordinator.ts Maps peers/servers to signaling URLs
│ ├── signaling-message-handler.ts Dispatches incoming signaling messages
│ └── server-membership-signaling-handler.ts Join / leave / switch protocol
├── peer-connection-manager/ WebRTC peer connections
│ ├── peer-connection.manager.ts Owns all RTCPeerConnection instances
│ ├── shared.ts PeerData type + state factory
│ ├── connection/
│ │ ├── create-peer-connection.ts RTCPeerConnection factory (ICE, transceivers)
│ │ └── negotiation.ts Offer/answer/ICE with collision handling
│ ├── messaging/
│ │ ├── data-channel.ts Ordered data channel for chat + control
│ │ └── ping.ts Latency measurement (PING/PONG every 5s)
│ ├── recovery/
│ │ └── peer-recovery.ts Disconnect grace period + reconnect loop
│ └── streams/
│ └── remote-streams.ts Classifies incoming tracks (voice vs screen)
├── media/ Local capture and processing
│ ├── media.manager.ts getUserMedia, mute, deafen, gain pipeline
│ ├── noise-reduction.manager.ts RNNoise AudioWorklet graph
│ ├── voice-session-controller.ts Higher-level wrapper over MediaManager
│ ├── screen-share.manager.ts Screen capture + per-peer track distribution
│ └── screen-share-platforms/
│ ├── shared.ts Electron desktopCapturer types
│ ├── browser-screen-share.capture.ts Standard getDisplayMedia
│ ├── desktop-electron-screen-share.capture.ts Electron source picker (Windows)
│ └── linux-electron-screen-share.capture.ts PulseAudio/PipeWire routing (Linux)
├── streams/ Stream facades
│ ├── peer-media-facade.ts Unified API over peers, media, screen share
│ └── remote-screen-share-request-controller.ts On-demand screen share delivery
├── state/
│ └── webrtc-state-controller.ts Angular Signals for all connection state
└── logging/
├── webrtc-logger.ts Conditional [WebRTC] prefixed logging
└── debug-network-metrics.ts Per-peer stats (drops, latency, throughput)
```
## How it all fits together
`WebRTCService` is the composition root. It instantiates every other manager, then wires their callbacks together after construction (to avoid circular references). No manager imports another manager directly.
```mermaid
graph TD
WS[WebRTCService<br/>composition root]
WS --> SC[SignalingTransportHandler]
WS --> PCM[PeerConnectionManager]
WS --> MM[MediaManager]
WS --> SSM[ScreenShareManager]
WS --> State[WebRtcStateController<br/>Angular Signals]
WS --> VSC[VoiceSessionController]
WS --> PMF[PeerMediaFacade]
WS --> RSSRC[RemoteScreenShareRequestController]
SC --> SM1[SignalingManager<br/>socket A]
SC --> SM2[SignalingManager<br/>socket B]
SC --> Coord[ServerSignalingCoordinator]
PCM --> Conn[create-peer-connection]
PCM --> Neg[negotiation]
PCM --> DC[data-channel]
PCM --> Ping[ping]
PCM --> Rec[peer-recovery]
PCM --> RS[remote-streams]
MM --> NR[NoiseReductionManager<br/>RNNoise worklet]
SSM --> BrowserCap[Browser capture]
SSM --> ElectronCap[Electron capture]
SSM --> LinuxCap[Linux audio routing]
click WS "realtime-session.service.ts" "WebRTCService - composition root" _blank
click SC "signaling/signaling-transport-handler.ts" "Routes messages to the right WebSocket" _blank
click PCM "peer-connection-manager/peer-connection.manager.ts" "Owns all RTCPeerConnection instances" _blank
click MM "media/media.manager.ts" "getUserMedia, mute, deafen, gain pipeline" _blank
click SSM "media/screen-share.manager.ts" "Screen capture and per-peer distribution" _blank
click State "state/webrtc-state-controller.ts" "Angular Signals for connection state" _blank
click VSC "media/voice-session-controller.ts" "Higher-level voice session wrapper" _blank
click PMF "streams/peer-media-facade.ts" "Unified API over peers, media, screen share" _blank
click RSSRC "streams/remote-screen-share-request-controller.ts" "On-demand screen share delivery" _blank
click SM1 "signaling/signaling.manager.ts" "One WebSocket per signaling URL" _blank
click SM2 "signaling/signaling.manager.ts" "One WebSocket per signaling URL" _blank
click Coord "signaling/server-signaling-coordinator.ts" "Maps peers/servers to signaling URLs" _blank
click Conn "peer-connection-manager/connection/create-peer-connection.ts" "RTCPeerConnection factory" _blank
click Neg "peer-connection-manager/connection/negotiation.ts" "Offer/answer/ICE with collision handling" _blank
click DC "peer-connection-manager/messaging/data-channel.ts" "Ordered data channel for chat + control" _blank
click Ping "peer-connection-manager/messaging/ping.ts" "Latency measurement via PING/PONG" _blank
click Rec "peer-connection-manager/recovery/peer-recovery.ts" "Disconnect grace period + reconnect loop" _blank
click RS "peer-connection-manager/streams/remote-streams.ts" "Classifies incoming tracks" _blank
click NR "media/noise-reduction.manager.ts" "RNNoise AudioWorklet graph" _blank
click BrowserCap "media/screen-share-platforms/browser-screen-share.capture.ts" "Standard getDisplayMedia" _blank
click ElectronCap "media/screen-share-platforms/desktop-electron-screen-share.capture.ts" "Electron source picker" _blank
click LinuxCap "media/screen-share-platforms/linux-electron-screen-share.capture.ts" "PulseAudio/PipeWire routing" _blank
```
## Signaling (WebSocket)
The signaling layer's only job is getting two peers to exchange SDP offers/answers and ICE candidates so they can establish a direct WebRTC connection. Once the peer connection is up, signaling is only used for presence (user joined/left) and reconnection.
Each signaling URL gets its own `SignalingManager` (one WebSocket each). `SignalingTransportHandler` picks the right socket based on which server the message is for. `ServerSignalingCoordinator` tracks which peers belong to which servers and which signaling URLs, so we know when it is safe to tear down a peer connection after leaving a server.
```mermaid
sequenceDiagram
participant UI as App
participant STH as SignalingTransportHandler
participant SM as SignalingManager
participant WS as WebSocket
participant Srv as Signaling Server
UI->>STH: identify(credentials)
STH->>SM: send(identify message)
SM->>WS: ws.send(JSON)
WS->>Srv: identify
UI->>STH: joinServer(serverId)
STH->>SM: send(join_server)
SM->>WS: ws.send(JSON)
Srv-->>WS: server_users [peerA, peerB]
WS-->>SM: onmessage
SM-->>STH: messageReceived$
STH-->>UI: routes to SignalingMessageHandler
```
### Reconnection
When the WebSocket drops, `SignalingManager` schedules reconnection with exponential backoff (1s, 2s, 4s, ... up to 30s). On reconnect it replays the cached `identify` and `join_server` messages so presence is restored without the UI doing anything.
## Peer connection lifecycle
Peers connect to each other directly with `RTCPeerConnection`. The "initiator" (whoever was already in the room) creates the data channel and audio/video transceivers, then sends an offer. The other side creates an answer.
```mermaid
sequenceDiagram
participant A as Peer A (initiator)
participant Sig as Signaling Server
participant B as Peer B
Note over A: createPeerConnection(B, initiator=true)
Note over A: Creates data channel + transceivers
A->>Sig: offer (SDP)
Sig->>B: offer (SDP)
Note over B: createPeerConnection(A, initiator=false)
Note over B: setRemoteDescription(offer)
Note over B: Attach local audio tracks
B->>Sig: answer (SDP)
Sig->>A: answer (SDP)
Note over A: setRemoteDescription(answer)
A->>Sig: ICE candidates
Sig->>B: ICE candidates
B->>Sig: ICE candidates
Sig->>A: ICE candidates
Note over A,B: RTCPeerConnection state -> "connected"
Note over A,B: Data channel opens, voice flows
```
### Offer collision
Both peers might send offers at the same time ("glare"). The negotiation module implements the "polite peer" pattern: one side is designated polite (the non-initiator) and will roll back its local offer if it detects a collision, then accept the remote offer instead. The impolite side ignores the incoming offer.
### Disconnect recovery
```mermaid
stateDiagram-v2
[*] --> Connected
Connected --> Disconnected: connectionState = "disconnected"
Disconnected --> Connected: recovers within 10s
Disconnected --> Failed: grace period expires
Failed --> Reconnecting: schedule reconnect (every 5s)
Reconnecting --> Connected: new offer accepted
Reconnecting --> GaveUp: 12 attempts failed
Connected --> Closed: leave / cleanup
GaveUp --> [*]
Closed --> [*]
```
When a peer connection enters `disconnected`, a 10-second grace period starts. If it recovers on its own (network blip), nothing happens. If it reaches `failed`, the connection is torn down and a reconnect loop starts: a fresh `RTCPeerConnection` is created and a new offer is sent every 5 seconds, up to 12 attempts.
## Data channel
A single ordered data channel carries all peer-to-peer messages: chat events, voice/screen state broadcasts, state requests, pings, and screen share control.
Back-pressure is handled with a high-water mark (4 MB) and low-water mark (1 MB). `sendToPeerBuffered()` waits for the buffer to drain before sending, which matters during file transfers.
Every 5 seconds a PING message is sent to each peer. The peer responds with PONG carrying the original timestamp, and the round-trip latency is stored in a signal.
## Media pipeline
### Voice
```mermaid
graph LR
Mic[getUserMedia] --> Raw[Raw mic stream]
Raw --> RNN{RNNoise<br/>enabled?}
RNN -- yes --> Worklet[AudioWorklet<br/>NoiseSuppressor]
RNN -- no --> Gain
Worklet --> Gain{Input gain<br/>adjusted?}
Gain -- yes --> GainNode[GainNode pipeline]
Gain -- no --> Out[Local media stream]
GainNode --> Out
Out --> Peers[replaceTrack on<br/>all peer audio senders]
click Mic "media/media.manager.ts" "MediaManager.enableVoice()" _blank
click Worklet "media/noise-reduction.manager.ts" "NoiseReductionManager.enable()" _blank
click GainNode "media/media.manager.ts" "MediaManager.applyInputGainToCurrentStream()" _blank
click Out "media/media.manager.ts" "MediaManager.localMediaStream" _blank
click Peers "media/media.manager.ts" "MediaManager.bindLocalTracksToAllPeers()" _blank
```
`MediaManager` grabs the mic with `getUserMedia`, optionally pipes it through the RNNoise AudioWorklet for noise reduction (48 kHz, loaded from `rnnoise-worklet.js`), optionally runs it through a `GainNode` for input volume control, and then pushes the resulting stream to every connected peer via `replaceTrack`.
Mute just disables the audio track (`track.enabled = false`), the connection stays up. Deafen suppresses incoming audio playback on the local side.
### Screen share
Screen capture uses a platform-specific strategy:
| Platform | Capture method |
|---|---|
| Browser | `getDisplayMedia` with quality presets |
| Windows (Electron) | Electron `desktopCapturer.getSources()` with a source picker UI |
| Linux (Electron) | `getDisplayMedia` for video + PulseAudio/PipeWire routing for system audio, keeping voice playback out of the capture |
Screen share tracks are distributed on-demand. A peer sends a `SCREEN_SHARE_REQUEST` message over the data channel, and only then does the sharer attach screen tracks to that peer's connection and renegotiate.
```mermaid
sequenceDiagram
participant V as Viewer
participant S as Sharer
V->>S: SCREEN_SHARE_REQUEST (data channel)
Note over S: Add viewer to requestedViewerPeerIds
Note over S: Attach screen video + audio senders
S->>V: renegotiate (new offer with screen tracks)
V->>S: answer
Note over V: ontrack fires with screen video
Note over V: Classified as screen share stream
Note over V: UI renders video
V->>S: SCREEN_SHARE_STOP (data channel)
Note over S: Remove screen senders
S->>V: renegotiate (offer without screen tracks)
```
## State
`WebRtcStateController` holds all connection state as Angular Signals: `isConnected`, `isMuted`, `isDeafened`, `isScreenSharing`, `connectedPeers`, `peerLatencies`, etc. Managers call update methods on the controller after state changes. Components and facades read these signals reactively.
## Logging
`WebRTCLogger` wraps `console.*` with a `[WebRTC]` prefix and a debug flag so logging can be toggled at runtime. `DebugNetworkMetrics` tracks per-peer stats (connection drops, handshake counts, message counts, download rates) for the debug console UI.
## ICE and STUN
WebRTC connections require a way for two peers to discover how to reach each other across different networks (NATs, firewalls, etc.). This is handled by ICE, with help from STUN.
### ICE (Interactive Connectivity Establishment)
ICE is the mechanism WebRTC uses to establish a connection between peers. Instead of relying on a single network path, it:
- Gathers multiple possible connection candidates (IP address + port pairs)
- Exchanges those candidates via the signaling layer
- Attempts connectivity checks between all candidate pairs
- Selects the first working path
Typical candidate types include:
- **Host candidates** - local network interfaces (e.g. LAN IPs)
- **Server reflexive candidates** - public-facing address discovered via STUN
- **Relay candidates** - provided by TURN servers (fallback)
ICE runs automatically as part of `RTCPeerConnection`. As candidates are discovered, they are emitted via `onicecandidate` and must be forwarded to the remote peer through signaling.
Connection state transitions (e.g. `checking``connected``failed`) reflect ICE progress.
### STUN (Session Traversal Utilities for NAT)
STUN is used to determine a peer's public-facing IP address and port when behind a NAT.
A STUN server responds with the external address it observes for a request. This allows a peer to generate a **server reflexive candidate**, which can be used by other peers to attempt a direct connection.
Without STUN, only local (host) candidates would be available, which typically do not work across different networks.
### TURN
TURN (Traversal Using Relays around NAT) is a fallback mechanism used in some WebRTC systems when direct peer-to-peer connectivity cannot be established.
Instead of connecting peers directly:
- Each peer establishes a connection to a TURN server
- The TURN server relays all media and data between peers
This approach is more reliable in restrictive network environments but introduces additional latency and bandwidth overhead, since all traffic flows through the relay instead of directly between peers.
Toju/Zoracord does not use TURN and does not have code written to support it.
### Summary
- **ICE** coordinates connection establishment by trying multiple network paths
- **STUN** provides public-facing address discovery for NAT traversal