Files
Toju/toju-app/src/app/infrastructure/realtime

Realtime Infrastructure

Low-level WebRTC and WebSocket plumbing that the rest of the app sits on top of. Nothing in here knows about Angular components, NgRx, or domain logic. It exposes observables, signals, and callbacks that higher layers (facades, effects, components) consume.

Module map

realtime/
├── realtime-session.service.ts    Composition root (WebRTCService)
├── realtime.types.ts              PeerData, credentials, tracker types
├── realtime.constants.ts          ICE servers, signal types, bitrates, intervals
│
├── signaling/                     WebSocket layer
│   ├── signaling.manager.ts                 One WebSocket per signaling URL
│   ├── signaling-transport-handler.ts       Routes messages to the right socket
│   ├── server-signaling-coordinator.ts      Maps peers/servers to signaling URLs
│   ├── signaling-message-handler.ts         Dispatches incoming signaling messages
│   └── server-membership-signaling-handler.ts  Join / leave / switch protocol
│
├── peer-connection-manager/       WebRTC peer connections
│   ├── peer-connection.manager.ts           Owns all RTCPeerConnection instances
│   ├── shared.ts                            PeerData type + state factory
│   ├── connection/
│   │   ├── create-peer-connection.ts        RTCPeerConnection factory (ICE, transceivers)
│   │   └── negotiation.ts                   Offer/answer/ICE with collision handling
│   ├── messaging/
│   │   ├── data-channel.ts                  Ordered data channel for chat + control
│   │   └── ping.ts                          Latency measurement (PING/PONG every 5s)
│   ├── recovery/
│   │   └── peer-recovery.ts                 Disconnect grace period + reconnect loop
│   └── streams/
│       └── remote-streams.ts                Classifies incoming tracks (voice vs camera vs screen)
│
├── media/                         Local capture and processing
│   ├── media.manager.ts                     getUserMedia, mute, deafen, camera capture, same-room routing, gain pipeline
│   ├── noise-reduction.manager.ts           RNNoise AudioWorklet graph
│   ├── voice-session-controller.ts          Higher-level wrapper over MediaManager
│   ├── screen-share.manager.ts              Screen capture + per-peer track distribution
│   └── screen-share-platforms/
│       ├── shared.ts                        Electron desktopCapturer types
│       ├── browser-screen-share.capture.ts  Standard getDisplayMedia
│       ├── desktop-electron-screen-share.capture.ts  Electron source picker (Windows)
│       └── linux-electron-screen-share.capture.ts    PulseAudio/PipeWire routing (Linux)
│
├── streams/                       Stream facades
│   ├── peer-media-facade.ts                 Unified API over peers, media, screen share
│   └── remote-screen-share-request-controller.ts  On-demand screen share delivery
│
├── state/
│   └── webrtc-state-controller.ts           Angular Signals for all connection state
│
└── logging/
    ├── webrtc-logger.ts                     Conditional [WebRTC] prefixed logging
    └── debug-network-metrics.ts             Per-peer stats (drops, latency, throughput)

How it all fits together

WebRTCService is the composition root. It instantiates every other manager, then wires their callbacks together after construction (to avoid circular references). No manager imports another manager directly.

graph TD
    WS[WebRTCService<br/>composition root]

    WS --> SC[SignalingTransportHandler]
    WS --> PCM[PeerConnectionManager]
    WS --> MM[MediaManager]
    WS --> SSM[ScreenShareManager]
    WS --> State[WebRtcStateController<br/>Angular Signals]
    WS --> VSC[VoiceSessionController]
    WS --> PMF[PeerMediaFacade]
    WS --> RSSRC[RemoteScreenShareRequestController]

    SC --> SM1[SignalingManager<br/>socket A]
    SC --> SM2[SignalingManager<br/>socket B]
    SC --> Coord[ServerSignalingCoordinator]

    PCM --> Conn[create-peer-connection]
    PCM --> Neg[negotiation]
    PCM --> DC[data-channel]
    PCM --> Ping[ping]
    PCM --> Rec[peer-recovery]
    PCM --> RS[remote-streams]

    MM --> NR[NoiseReductionManager<br/>RNNoise worklet]
    SSM --> BrowserCap[Browser capture]
    SSM --> ElectronCap[Electron capture]
    SSM --> LinuxCap[Linux audio routing]

    click WS "realtime-session.service.ts" "WebRTCService - composition root" _blank
    click SC "signaling/signaling-transport-handler.ts" "Routes messages to the right WebSocket" _blank
    click PCM "peer-connection-manager/peer-connection.manager.ts" "Owns all RTCPeerConnection instances" _blank
    click MM "media/media.manager.ts" "getUserMedia, mute, deafen, gain pipeline" _blank
    click SSM "media/screen-share.manager.ts" "Screen capture and per-peer distribution" _blank
    click State "state/webrtc-state-controller.ts" "Angular Signals for connection state" _blank
    click VSC "media/voice-session-controller.ts" "Higher-level voice session wrapper" _blank
    click PMF "streams/peer-media-facade.ts" "Unified API over peers, media, screen share" _blank
    click RSSRC "streams/remote-screen-share-request-controller.ts" "On-demand screen share delivery" _blank
    click SM1 "signaling/signaling.manager.ts" "One WebSocket per signaling URL" _blank
    click SM2 "signaling/signaling.manager.ts" "One WebSocket per signaling URL" _blank
    click Coord "signaling/server-signaling-coordinator.ts" "Maps peers/servers to signaling URLs" _blank
    click Conn "peer-connection-manager/connection/create-peer-connection.ts" "RTCPeerConnection factory" _blank
    click Neg "peer-connection-manager/connection/negotiation.ts" "Offer/answer/ICE with collision handling" _blank
    click DC "peer-connection-manager/messaging/data-channel.ts" "Ordered data channel for chat + control" _blank
    click Ping "peer-connection-manager/messaging/ping.ts" "Latency measurement via PING/PONG" _blank
    click Rec "peer-connection-manager/recovery/peer-recovery.ts" "Disconnect grace period + reconnect loop" _blank
    click RS "peer-connection-manager/streams/remote-streams.ts" "Classifies incoming tracks" _blank
    click NR "media/noise-reduction.manager.ts" "RNNoise AudioWorklet graph" _blank
    click BrowserCap "media/screen-share-platforms/browser-screen-share.capture.ts" "Standard getDisplayMedia" _blank
    click ElectronCap "media/screen-share-platforms/desktop-electron-screen-share.capture.ts" "Electron source picker" _blank
    click LinuxCap "media/screen-share-platforms/linux-electron-screen-share.capture.ts" "PulseAudio/PipeWire routing" _blank

Signaling (WebSocket)

The signaling layer's only job is getting two peers to exchange SDP offers/answers and ICE candidates so they can establish a direct WebRTC connection. Once the peer connection is up, signaling is only used for presence (user joined/left) and reconnection.

Each signaling URL gets its own SignalingManager (one WebSocket each). SignalingTransportHandler picks the right socket based on which server the message is for. ServerSignalingCoordinator tracks which peers belong to which servers and which signaling URLs, so we know when it is safe to tear down a peer connection after leaving a server.

Room affinity is authoritative at this layer as well. The renderer repairs each room's saved sourceId / sourceUrl from server-directory responses and routes join_server, view_server, and room-scoped signaling traffic to that room's signaling URL first. If that route fails, alternate endpoints can be tried temporarily, but server-scoped raw messages are no longer broadcast to every connected signaling manager when the route is unknown.

Cold-start routing now waits for the initial server-directory health probes so same-backend aliases can collapse to one canonical signaling endpoint before any saved rooms reconnect. When a room is reconnected on a chosen socket, its background rooms are re-joined on that same socket as well so stale per-signal memberships do not keep orphan managers alive, and reconnect replay only sends view_server for rooms that manager still has joined.

This is still a non-federated model. Different signaling servers do not share peer registries or relay WebRTC offers for each other, so users in the same room must converge on the same signaling endpoint to discover one another reliably.

sequenceDiagram
    participant UI as App
    participant STH as SignalingTransportHandler
    participant SM as SignalingManager
    participant WS as WebSocket
    participant Srv as Signaling Server

    UI->>STH: identify(credentials)
    STH->>SM: send(identify message)
    SM->>WS: ws.send(JSON)
    WS->>Srv: identify

    UI->>STH: joinServer(serverId)
    STH->>SM: send(join_server)
    SM->>WS: ws.send(JSON)

    Srv-->>WS: server_users [peerA, peerB]
    WS-->>SM: onmessage
    SM-->>STH: messageReceived$
    STH-->>UI: routes to SignalingMessageHandler

Reconnection

When the WebSocket drops, SignalingManager schedules reconnection with exponential backoff (1s, 2s, 4s, ... up to 30s). On reconnect it replays the cached identify and join_server messages so presence is restored without the UI doing anything.

Server-side connection hygiene

Browsers do not reliably fire WebSocket close events during page refresh or navigation (especially Chromium). The server's handleIdentify now closes any existing connection that shares the same oderId but a different connectionId. This guarantees findUserByOderId always routes offers and presence events to the freshest socket, eliminating a class of bugs where signaling messages landed on a dead tab's socket and were silently lost.

Join and leave broadcasts are also identity-aware: handleJoinServer only broadcasts user_joined when the identity is genuinely new to that server (not just a second WebSocket connection for the same user), and handleLeaveServer / dead-connection cleanup only broadcast user_left when no other open connection for that identity remains in the server. The user_left payload includes serverIds listing the rooms the identity still belongs to, so the client can subtract correctly without over-removing.

Multi-room presence

server_users, user_joined, and user_left are room-scoped presence messages, but the renderer must treat them as updates into a global multi-room presence view. The users store tracks presenceServerIds per user instead of clearing the whole slice when a new server_users snapshot arrives, so startup/search background rooms keep their server-rail voice badges and active voice peers do not disappear when the user views a different server.

Peer routing also has to stay scoped to the signaling server that reported the membership. A user_left from one signaling cluster must only subtract that cluster's shared servers; otherwise a leave on signal.toju.app can incorrectly tear down a peer that is still shared through signal-sweden.toju.app or a local signaling server. Route metadata is therefore kept across peer recreation and only cleared once the renderer no longer shares any servers with that peer.

Peer connection lifecycle

Peers connect to each other directly with RTCPeerConnection. The initiator is chosen deterministically from the identified logical peer IDs so only one side creates the offer and primary data channel for a given pair. The other side creates an answer. If identity or negotiation is still settling, the retry timer defers instead of comparing against the ephemeral local transport ID or reusing a half-open peer forever.

sequenceDiagram
    participant A as Peer A (elected initiator)
    participant Sig as Signaling Server
    participant B as Peer B

    Note over A: createPeerConnection(B, initiator=true)
    Note over A: Creates data channel + transceivers

    A->>Sig: offer (SDP)
    Sig->>B: offer (SDP)

    Note over B: createPeerConnection(A, initiator=false)
    Note over B: setRemoteDescription(offer)
    Note over B: Attach local audio tracks
    B->>Sig: answer (SDP)
    Sig->>A: answer (SDP)
    Note over A: setRemoteDescription(answer)

    A->>Sig: ICE candidates
    Sig->>B: ICE candidates
    B->>Sig: ICE candidates
    Sig->>A: ICE candidates

    Note over A,B: RTCPeerConnection state -> "connected"
    Note over A,B: Data channel opens, voice flows

Offer collision

Both peers might send offers at the same time ("glare"). The negotiation module implements the "polite peer" pattern: one side is designated polite (the non-initiator) and will roll back its local offer if it detects a collision, then accept the remote offer instead. The impolite side ignores the incoming offer.

Existing members also schedule a short user_joined fallback offer, and the server_users path now re-arms the same retry when an initial attempt stalls. The joiner still tries first via its server_users snapshot, but the fallback heals late-join races or half-open peers where that initial offer never arrives or never finishes. The retry uses the same deterministic initiator election as the main server_users path so the pair cannot regress into dual initiators.

Non-initiator takeover

If the elected initiator's offer never arrives (stale socket, network issue, page still loading), the non-initiator does not wait forever. It tracks the start of each waiting period in nonInitiatorWaitStart. For the first NON_INITIATOR_GIVE_UP_MS (5 s) it reschedules and logs. Once that window expires it takes over: removes any stale peer, creates a fresh RTCPeerConnection as initiator, and sends its own offer. This ensures every peer pair eventually establishes a connection regardless of which side was originally elected.

Stale peer replacement

Offers or ICE candidates can arrive while the existing RTCPeerConnection for that peer is in failed or closed state (the browser's connectionstatechange event hasn't fired yet to clean it up). replaceUnusablePeer() in negotiation.ts detects this, closes the dead connection, removes it from the active map, and lets the caller proceed with a fresh peer. The connectionstatechange handler in create-peer-connection.ts also guards against stale events: if the connection object no longer matches the current map entry for that peer, the event is ignored so it cannot accidentally remove a replacement peer.

Disconnect recovery

stateDiagram-v2
    [*] --> Connected
    Connected --> Disconnected: connectionState = "disconnected"
    Disconnected --> Connected: recovers within 10s
    Disconnected --> Failed: grace period expires
    Failed --> Reconnecting: schedule reconnect (every 5s)
    Reconnecting --> Connected: new offer accepted
    Reconnecting --> GaveUp: 12 attempts failed
    Connected --> Closed: leave / cleanup
    GaveUp --> [*]
    Closed --> [*]

When a peer connection enters disconnected, a 10-second grace period starts. If it recovers on its own (network blip), nothing happens. If it reaches failed, the connection is torn down and a reconnect loop starts. A fresh RTCPeerConnection is created every 5 seconds, up to 12 attempts; only the deterministically elected initiator sends a reconnect offer, while the other side waits for that offer.

Data channel

A single ordered data channel carries all peer-to-peer messages: chat events, voice/screen state broadcasts, voice-channel move control events, state requests, pings, and screen share control.

Back-pressure is handled with a high-water mark (4 MB) and low-water mark (1 MB). sendToPeerBuffered() waits for the buffer to drain before sending, which matters during file transfers.

Every 5 seconds a PING message is sent to each peer. The peer responds with PONG carrying the original timestamp, and the round-trip latency is stored in a signal.

Media pipeline

Voice

graph LR
    Mic[getUserMedia] --> Raw[Raw mic stream]
    Raw --> RNN{RNNoise<br/>enabled?}
    RNN -- yes --> Worklet[AudioWorklet<br/>NoiseSuppressor]
    RNN -- no --> Gain
    Worklet --> Gain{Input gain<br/>adjusted?}
    Gain -- yes --> GainNode[GainNode pipeline]
    Gain -- no --> Out[Local media stream]
    GainNode --> Out
    Out --> Peers[replaceTrack on<br/>all peer audio senders]

    click Mic "media/media.manager.ts" "MediaManager.enableVoice()" _blank
    click Worklet "media/noise-reduction.manager.ts" "NoiseReductionManager.enable()" _blank
    click GainNode "media/media.manager.ts" "MediaManager.applyInputGainToCurrentStream()" _blank
    click Out "media/media.manager.ts" "MediaManager.localMediaStream" _blank
    click Peers "media/media.manager.ts" "MediaManager.bindLocalTracksToAllPeers()" _blank

MediaManager grabs the mic with getUserMedia, optionally pipes it through the RNNoise AudioWorklet for noise reduction (48 kHz, loaded from rnnoise-worklet.js), optionally runs it through a GainNode for input volume control, and then routes the resulting audio track only to peers that currently belong to the same active voice channel. The same manager also owns camera capture as a separate video-only stream, attaches it to its own video transceiver, and applies the same voice-channel routing rules so webcam video only reaches peers in the active voice room.

Mute just disables the audio track (track.enabled = false), the connection stays up. Deafen suppresses incoming audio playback on the local side.

Because peers stay connected across the server for shared state and chat, voice-channel isolation is enforced in both transport and playback: outgoing mic audio is only attached to peers whose voice membership matches the local user's current channel, and remote voice audio plus join/leave cues are only active when the remote peer's announced voiceState.roomId and voiceState.serverId match the local user's current voice channel.

Camera

sequenceDiagram
    participant UI as VoiceControls/UI
    participant MM as MediaManager
    participant Peer as PeerConnectionManager
    participant Remote as Remote peer
    participant RS as remote-streams.ts
    participant Shell as VoiceWorkspaceComponent

    UI->>MM: enableCamera()
    Note over MM: getUserMedia({ video: true, audio: false })
    Note over MM: Store localCameraStream
    MM->>MM: syncCameraRouting()
    Note over MM: Attach video track only to same-room peers
    MM->>Peer: renegotiate(peerId)
    MM->>Remote: broadcast camera-state
    Peer->>Remote: offer/answer with camera video transceiver
    Remote->>RS: ontrack(video)
    Note over RS: Classify as camera, not screen share
    RS->>Shell: getRemoteCameraStream(peerId)
    Shell->>Shell: Render camera tile in voice workspace

    UI->>MM: disableCamera()
    MM->>MM: stopLocalCameraStream()
    MM->>MM: detach camera sender from peers
    MM->>Remote: broadcast camera-state(false)

Camera capture is video-only, uses a dedicated camera sender, and follows the same same-room peer filter as outgoing voice audio. Incoming camera video is classified separately from screen-share tracks so the workspace can show both at the same time.

Screen share

Screen capture uses a platform-specific strategy:

Platform Capture method
Browser getDisplayMedia with quality presets
Windows (Electron) Electron desktopCapturer.getSources() with a source picker UI
Linux (Electron) getDisplayMedia for video + PulseAudio/PipeWire routing for system audio, keeping voice playback out of the capture

Screen share tracks are distributed on-demand. A peer sends a SCREEN_SHARE_REQUEST message over the data channel, and only then does the sharer attach screen tracks to that peer's connection and renegotiate.

sequenceDiagram
    participant V as Viewer
    participant S as Sharer

    V->>S: SCREEN_SHARE_REQUEST (data channel)
    Note over S: Add viewer to requestedViewerPeerIds
    Note over S: Attach screen video + audio senders
    S->>V: renegotiate (new offer with screen tracks)
    V->>S: answer
    Note over V: ontrack fires with screen video
    Note over V: Classified as screen share stream
    Note over V: UI renders video

    V->>S: SCREEN_SHARE_STOP (data channel)
    Note over S: Remove screen senders
    S->>V: renegotiate (offer without screen tracks)

State

WebRtcStateController holds all connection state as Angular Signals: isConnected, isMuted, isDeafened, isScreenSharing, connectedPeers, peerLatencies, etc. Managers call update methods on the controller after state changes. Components and facades read these signals reactively.

Logging

WebRTCLogger wraps console.* with a [WebRTC] prefix and a debug flag so logging can be toggled at runtime. DebugNetworkMetrics tracks per-peer stats (connection drops, handshake counts, message counts, download rates) for the debug console UI.

ICE and STUN

WebRTC connections require a way for two peers to discover how to reach each other across different networks (NATs, firewalls, etc.). This is handled by ICE, with help from STUN.

ICE (Interactive Connectivity Establishment)

ICE is the mechanism WebRTC uses to establish a connection between peers. Instead of relying on a single network path, it:

  • Gathers multiple possible connection candidates (IP address + port pairs)
  • Exchanges those candidates via the signaling layer
  • Attempts connectivity checks between all candidate pairs
  • Selects the first working path

Typical candidate types include:

  • Host candidates - local network interfaces (e.g. LAN IPs)
  • Server reflexive candidates - public-facing address discovered via STUN
  • Relay candidates - provided by TURN servers (fallback)

ICE runs automatically as part of RTCPeerConnection. As candidates are discovered, they are emitted via onicecandidate and must be forwarded to the remote peer through signaling.

Connection state transitions (e.g. checkingconnectedfailed) reflect ICE progress.

STUN (Session Traversal Utilities for NAT)

STUN is used to determine a peer's public-facing IP address and port when behind a NAT.

A STUN server responds with the external address it observes for a request. This allows a peer to generate a server reflexive candidate, which can be used by other peers to attempt a direct connection.

Without STUN, only local (host) candidates would be available, which typically do not work across different networks.

TURN

TURN (Traversal Using Relays around NAT) is a fallback mechanism used in some WebRTC systems when direct peer-to-peer connectivity cannot be established.

Instead of connecting peers directly:

  • Each peer establishes a connection to a TURN server
  • The TURN server relays all media and data between peers

This approach is more reliable in restrictive network environments but introduces additional latency and bandwidth overhead, since all traffic flows through the relay instead of directly between peers.

Toju/Zoracord does not use TURN and does not have code written to support it.

Summary

  • ICE coordinates connection establishment by trying multiple network paths
  • STUN provides public-facing address discovery for NAT traversal