Low-Latency Desktop Audio Capture Without Feedback (Linux, Electron) #12

Open
opened 2026-04-16 23:41:19 +00:00 by myxelium · 0 comments
Owner

User Story: Low-Latency Desktop Audio Capture Without Feedback (Linux, Electron)

Context

I am building a Discord-like application using Electron on Linux. The app includes screen sharing with system audio capture.

The current implementation uses PulseAudio (pactl, parec) with:

  • Virtual sinks (module-null-sink)
  • Loopbacks (module-loopback)
  • Monitor source capture (parec)

This setup successfully captures desktop audio while excluding the app’s own audio (preventing feedback loops). However, it introduces ~2 seconds of audio latency, causing significant desynchronization with the video stream.


Problem Statement

The current audio pipeline introduces excessive latency due to buffering across:

  • PulseAudio loopbacks
  • Monitor sources
  • parec internal buffering
  • Node.js stream + IPC transfer

This makes the solution unsuitable for real-time communication.


Goal

Design a low-latency (<100ms target) desktop audio capture system on Linux that:

  1. Captures system audio (what the user hears)
  2. Excludes audio produced by the app itself (no feedback loop)
  3. Stays synchronized with video (WebRTC-friendly latency)
  4. Works reliably across common Linux environments
  5. Integrates with Electron

Current Architecture (Simplified)

App audio → VOICE_SINK
Other apps → SCREEN_SHARE_SINK

SCREEN_SHARE_SINK.monitor → parec → Node → IPC → renderer → WebRTC

Loopbacks:
- SCREEN_SHARE_SINK.monitor → real output sink
- VOICE_SINK.monitor → real output sink

Additionally:

  • Sink inputs are dynamically rerouted using pactl
  • App-owned processes are detected and separated
  • A polling + event-based system enforces routing

Key Constraints

  • Must run on Linux (PulseAudio and/or PipeWire environments)
  • Must work inside Electron (Node.js + Chromium)
  • Cannot rely on kernel modules or privileged access
  • Should degrade gracefully if advanced audio routing is unavailable
  • Must prevent audio feedback (strict requirement)

Non-Goals

  • Windows or macOS support
  • Microphone capture (handled separately)
  • Audio effects or processing (focus is routing + latency)

Requirements

Functional

  • Capture system audio in real time
  • Exclude app’s own playback audio from capture
  • Maintain stable routing even when new audio streams appear
  • Support starting/stopping capture dynamically

Non-Functional

  • End-to-end latency: <100ms (ideal), <200ms (acceptable)
  • No audible glitches or dropouts
  • Minimal CPU overhead
  • Robust against stream churn (apps opening/closing)

Pain Points in Current Implementation

  • ~2 second delay caused by accumulated buffering
  • module-loopback introduces unpredictable latency
  • parec buffers aggressively by default
  • Multiple audio hops increase delay
  • Complex rerouting logic (polling + subscribe)

Desired Output from the Model

Provide a detailed technical proposal including:

1. Architecture Options

Compare at least:

  • Improved PulseAudio approach (no loopbacks)
  • PipeWire-native solution
  • Hybrid compatibility approach

Include:

  • Audio routing diagram
  • How to exclude app audio cleanly
  • How to minimize buffering

3. Implementation Plan

  • Step-by-step migration from current system
  • Example commands / APIs
  • Electron integration approach

4. Latency Analysis

  • Where latency is introduced
  • Expected latency after improvements

5. Trade-offs

  • Compatibility vs performance
  • PulseAudio vs PipeWire
  • Complexity vs reliability

6. Optional Enhancements

  • Direct PipeWire API usage
  • WebRTC-native capture paths
  • Eliminating parec

Success Criteria

  • Audio latency is reduced from ~2000ms → <100–200ms
  • No feedback loop occurs under any condition
  • Audio remains synchronized with video during screen sharing
  • System works reliably across multiple Linux distributions

Notes

  • Current implementation already correctly separates app vs system audio
  • The main issue is latency, not correctness
  • A solution that simplifies the pipeline is strongly preferred

Priority

High — this directly impacts core real-time communication UX

# User Story: Low-Latency Desktop Audio Capture Without Feedback (Linux, Electron) ## Context I am building a Discord-like application using Electron on Linux. The app includes screen sharing with system audio capture. The current implementation uses PulseAudio (`pactl`, `parec`) with: * Virtual sinks (`module-null-sink`) * Loopbacks (`module-loopback`) * Monitor source capture (`parec`) This setup successfully captures desktop audio while excluding the app’s own audio (preventing feedback loops). However, it introduces **~2 seconds of audio latency**, causing significant desynchronization with the video stream. --- ## Problem Statement The current audio pipeline introduces excessive latency due to buffering across: * PulseAudio loopbacks * Monitor sources * `parec` internal buffering * Node.js stream + IPC transfer This makes the solution unsuitable for real-time communication. --- ## Goal Design a **low-latency (<100ms target)** desktop audio capture system on Linux that: 1. Captures system audio (what the user hears) 2. Excludes audio produced by the app itself (no feedback loop) 3. Stays synchronized with video (WebRTC-friendly latency) 4. Works reliably across common Linux environments 5. Integrates with Electron --- ## Current Architecture (Simplified) ``` App audio → VOICE_SINK Other apps → SCREEN_SHARE_SINK SCREEN_SHARE_SINK.monitor → parec → Node → IPC → renderer → WebRTC Loopbacks: - SCREEN_SHARE_SINK.monitor → real output sink - VOICE_SINK.monitor → real output sink ``` Additionally: * Sink inputs are dynamically rerouted using `pactl` * App-owned processes are detected and separated * A polling + event-based system enforces routing --- ## Key Constraints * Must run on Linux (PulseAudio and/or PipeWire environments) * Must work inside Electron (Node.js + Chromium) * Cannot rely on kernel modules or privileged access * Should degrade gracefully if advanced audio routing is unavailable * Must prevent audio feedback (strict requirement) --- ## Non-Goals * Windows or macOS support * Microphone capture (handled separately) * Audio effects or processing (focus is routing + latency) --- ## Requirements ### Functional * Capture system audio in real time * Exclude app’s own playback audio from capture * Maintain stable routing even when new audio streams appear * Support starting/stopping capture dynamically ### Non-Functional * End-to-end latency: **<100ms (ideal), <200ms (acceptable)** * No audible glitches or dropouts * Minimal CPU overhead * Robust against stream churn (apps opening/closing) --- ## Pain Points in Current Implementation * ~2 second delay caused by accumulated buffering * `module-loopback` introduces unpredictable latency * `parec` buffers aggressively by default * Multiple audio hops increase delay * Complex rerouting logic (polling + subscribe) --- ## Desired Output from the Model Provide a **detailed technical proposal** including: ### 1. Architecture Options Compare at least: * Improved PulseAudio approach (no loopbacks) * PipeWire-native solution * Hybrid compatibility approach ### 2. Recommended Architecture Include: * Audio routing diagram * How to exclude app audio cleanly * How to minimize buffering ### 3. Implementation Plan * Step-by-step migration from current system * Example commands / APIs * Electron integration approach ### 4. Latency Analysis * Where latency is introduced * Expected latency after improvements ### 5. Trade-offs * Compatibility vs performance * PulseAudio vs PipeWire * Complexity vs reliability ### 6. Optional Enhancements * Direct PipeWire API usage * WebRTC-native capture paths * Eliminating `parec` --- ## Success Criteria * Audio latency is reduced from ~2000ms → <100–200ms * No feedback loop occurs under any condition * Audio remains synchronized with video during screen sharing * System works reliably across multiple Linux distributions --- ## Notes * Current implementation already correctly separates app vs system audio * The main issue is latency, not correctness * A solution that simplifies the pipeline is strongly preferred --- ## Priority High — this directly impacts core real-time communication UX
myxelium added this to the Zoracord 1:1 project 2026-04-16 23:41:19 +00:00
myxelium moved this to To Do in Zoracord 1:1 on 2026-04-29 17:42:43 +00:00
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: myxelium/Toju#12