Files
Toju/toju-app/src/app/domains/voice-connection/README.md

5.2 KiB

Voice Connection Domain

Bridges the application layer to the low-level realtime infrastructure for voice calls and in-channel camera transport. Provides speaking detection via Web Audio analysis and per-peer volume control for playback. The actual WebRTC plumbing lives in infrastructure/realtime; this domain wraps it with a clean facade.

Module map

voice-connection/
├── application/
│   ├── facades/
│   │   └── voice-connection.facade.ts           Proxy to RealtimeSessionFacade for voice and camera signals/methods
│   └── services/
│       ├── voice-activity.service.ts            RMS-based speaking detection via AnalyserNode (per-user signals)
│       └── voice-playback.service.ts            Per-peer GainNode chain, 0-200% volume, deafen support
│
├── domain/
│   └── models/
│       └── voice-connection.model.ts            Re-exports LatencyProfile, VoiceStateSnapshot from shared-kernel / realtime
│
└── index.ts                                 Barrel exports

Service relationships

graph TD
    VCF[VoiceConnectionFacade]
    VAS[VoiceActivityService]
    VPS[VoicePlaybackService]
    RSF[RealtimeSessionFacade]
    Models[voice-connection.models]

    VCF --> RSF
    VAS --> VCF
    VPS --> VCF

    click VCF "application/facades/voice-connection.facade.ts" "Proxy to RealtimeSessionFacade" _blank
    click VAS "application/services/voice-activity.service.ts" "RMS-based speaking detection" _blank
    click VPS "application/services/voice-playback.service.ts" "Per-peer GainNode volume chain" _blank
    click RSF "../../infrastructure/realtime/realtime-session.service.ts" "Low-level WebRTC composition root" _blank
    click Models "domain/models/voice-connection.model.ts" "Re-exported types" _blank

Voice connection facade

VoiceConnectionFacade exposes signals and methods from RealtimeSessionFacade without leaking infrastructure details into feature components. It covers:

  • Connection state: isVoiceConnected, isMuted, isDeafened, isCameraEnabled, hasConnectionError
  • Stream access: getRemoteVoiceStream, getRemoteCameraStream, getLocalStream, getLocalCameraStream, getRawMicStream
  • Controls: enableVoice, disableVoice, enableCamera, disableCamera, toggleMute, toggleDeafen, toggleNoiseReduction
  • Audio tuning: setOutputVolume, setInputVolume, setAudioBitrate, setLatencyProfile
  • Peer events: onRemoteStream, onPeerConnected, onPeerDisconnected
  • Heartbeat: startVoiceHeartbeat, stopVoiceHeartbeat

Camera transport

Camera capture is treated as voice-adjacent transport, not screen share. The underlying realtime layer routes webcam video only to peers in the same active voice channel, exposes remote camera streams through getRemoteCameraStream(peerId), and keeps webcam senders separate from screen-share senders so both features can run at the same time.

Speaking detection

VoiceActivityService monitors audio levels for local and remote streams using the Web Audio API. Each tracked stream gets its own AudioContext with an AnalyserNode. A single requestAnimationFrame loop polls all analysers.

graph LR
    Stream[MediaStream] --> Ctx[AudioContext]
    Ctx --> Src[MediaStreamAudioSourceNode]
    Src --> Analyser[AnalyserNode<br/>fftSize = 256]
    Analyser --> Poll[rAF poll loop]
    Poll --> RMS{RMS >= 0.015?}
    RMS -- yes --> Speaking[speakingSignal = true]
    RMS -- no, 8 frames --> Silent[speakingSignal = false]

    click Stream "application/services/voice-activity.service.ts" "VoiceActivityService.trackStream()" _blank
    click Poll "application/services/voice-activity.service.ts" "VoiceActivityService.poll()" _blank
Parameter Value
FFT size 256 samples
Speaking threshold RMS >= 0.015
Silent grace period 8 consecutive frames below threshold

The service exposes isSpeaking(userId) and volume(userId) as Angular signals. It automatically tracks remote peers via the onRemoteStream and onPeerDisconnected observables. Local mic tracking is started explicitly by calling trackLocalMic(userId, stream).

A reactive speakingMap signal (a Map<string, boolean>) is published whenever any user's speaking state changes, so components can bind directly.

Voice playback

VoicePlaybackService handles audio output for remote peers. Each peer gets an independent Web Audio pipeline:

graph LR
    Remote[Remote stream] --> Src[MediaStreamAudioSourceNode]
    Src --> Gain[GainNode<br/>0 - 200%]
    Gain --> Dest[MediaStreamAudioDestinationNode]
    Dest --> Audio[HTMLAudioElement<br/>.play]

    click Remote "application/voice-playback.service.ts" "VoicePlaybackService.setupPeer()" _blank
    click Gain "application/voice-playback.service.ts" "VoicePlaybackService.setUserVolume()" _blank

Volume per peer is stored in localStorage and restored on reconnect. The range is 0% to 200% (gain values 0.0 to 2.0). When the user deafens, all gain nodes are set to zero; undeafening restores the previous values.

A Chrome workaround attaches a muted <audio> element to keep the AudioContext from suspending when no audible output is detected.