Low-Latency Desktop Audio Capture Without Feedback (Linux, Electron) #12

New Issue

myxelium · 2026-04-16T23:41:19Z

myxelium commented

2026-04-16 23:41:19 +00:00

User Story: Low-Latency Desktop Audio Capture Without Feedback (Linux, Electron)

Context

I am building a Discord-like application using Electron on Linux. The app includes screen sharing with system audio capture.

The current implementation uses PulseAudio (pactl, parec) with:

Virtual sinks (module-null-sink)
Loopbacks (module-loopback)
Monitor source capture (parec)

This setup successfully captures desktop audio while excluding the app’s own audio (preventing feedback loops). However, it introduces ~2 seconds of audio latency, causing significant desynchronization with the video stream.

Problem Statement

The current audio pipeline introduces excessive latency due to buffering across:

PulseAudio loopbacks
Monitor sources
parec internal buffering
Node.js stream + IPC transfer

This makes the solution unsuitable for real-time communication.

Goal

Design a low-latency (<100ms target) desktop audio capture system on Linux that:

Captures system audio (what the user hears)
Excludes audio produced by the app itself (no feedback loop)
Stays synchronized with video (WebRTC-friendly latency)
Works reliably across common Linux environments
Integrates with Electron

Current Architecture (Simplified)

App audio → VOICE_SINK
Other apps → SCREEN_SHARE_SINK

SCREEN_SHARE_SINK.monitor → parec → Node → IPC → renderer → WebRTC

Loopbacks:
- SCREEN_SHARE_SINK.monitor → real output sink
- VOICE_SINK.monitor → real output sink

Additionally:

Sink inputs are dynamically rerouted using pactl
App-owned processes are detected and separated
A polling + event-based system enforces routing

Key Constraints

Must run on Linux (PulseAudio and/or PipeWire environments)
Must work inside Electron (Node.js + Chromium)
Cannot rely on kernel modules or privileged access
Should degrade gracefully if advanced audio routing is unavailable
Must prevent audio feedback (strict requirement)

Non-Goals

Windows or macOS support
Microphone capture (handled separately)
Audio effects or processing (focus is routing + latency)

Requirements

Functional

Capture system audio in real time
Exclude app’s own playback audio from capture
Maintain stable routing even when new audio streams appear
Support starting/stopping capture dynamically

Non-Functional

End-to-end latency: <100ms (ideal), <200ms (acceptable)
No audible glitches or dropouts
Minimal CPU overhead
Robust against stream churn (apps opening/closing)

Pain Points in Current Implementation

~2 second delay caused by accumulated buffering
module-loopback introduces unpredictable latency
parec buffers aggressively by default
Multiple audio hops increase delay
Complex rerouting logic (polling + subscribe)

Desired Output from the Model

Provide a detailed technical proposal including:

1. Architecture Options

Compare at least:

Improved PulseAudio approach (no loopbacks)
PipeWire-native solution
Hybrid compatibility approach

2. Recommended Architecture

Include:

Audio routing diagram
How to exclude app audio cleanly
How to minimize buffering

3. Implementation Plan

Step-by-step migration from current system
Example commands / APIs
Electron integration approach

4. Latency Analysis

Where latency is introduced
Expected latency after improvements

5. Trade-offs

Compatibility vs performance
PulseAudio vs PipeWire
Complexity vs reliability

6. Optional Enhancements

Direct PipeWire API usage
WebRTC-native capture paths
Eliminating parec

Success Criteria

Audio latency is reduced from ~2000ms → <100–200ms
No feedback loop occurs under any condition
Audio remains synchronized with video during screen sharing
System works reliably across multiple Linux distributions

Notes

Current implementation already correctly separates app vs system audio
The main issue is latency, not correctness
A solution that simplifies the pipeline is strongly preferred

Priority

High — this directly impacts core real-time communication UX

# User Story: Low-Latency Desktop Audio Capture Without Feedback (Linux, Electron) ## Context I am building a Discord-like application using Electron on Linux. The app includes screen sharing with system audio capture. The current implementation uses PulseAudio (`pactl`, `parec`) with: * Virtual sinks (`module-null-sink`) * Loopbacks (`module-loopback`) * Monitor source capture (`parec`) This setup successfully captures desktop audio while excluding the app’s own audio (preventing feedback loops). However, it introduces **~2 seconds of audio latency**, causing significant desynchronization with the video stream. --- ## Problem Statement The current audio pipeline introduces excessive latency due to buffering across: * PulseAudio loopbacks * Monitor sources * `parec` internal buffering * Node.js stream + IPC transfer This makes the solution unsuitable for real-time communication. --- ## Goal Design a **low-latency (<100ms target)** desktop audio capture system on Linux that: 1. Captures system audio (what the user hears) 2. Excludes audio produced by the app itself (no feedback loop) 3. Stays synchronized with video (WebRTC-friendly latency) 4. Works reliably across common Linux environments 5. Integrates with Electron --- ## Current Architecture (Simplified) ``` App audio → VOICE_SINK Other apps → SCREEN_SHARE_SINK SCREEN_SHARE_SINK.monitor → parec → Node → IPC → renderer → WebRTC Loopbacks: - SCREEN_SHARE_SINK.monitor → real output sink - VOICE_SINK.monitor → real output sink ``` Additionally: * Sink inputs are dynamically rerouted using `pactl` * App-owned processes are detected and separated * A polling + event-based system enforces routing --- ## Key Constraints * Must run on Linux (PulseAudio and/or PipeWire environments) * Must work inside Electron (Node.js + Chromium) * Cannot rely on kernel modules or privileged access * Should degrade gracefully if advanced audio routing is unavailable * Must prevent audio feedback (strict requirement) --- ## Non-Goals * Windows or macOS support * Microphone capture (handled separately) * Audio effects or processing (focus is routing + latency) --- ## Requirements ### Functional * Capture system audio in real time * Exclude app’s own playback audio from capture * Maintain stable routing even when new audio streams appear * Support starting/stopping capture dynamically ### Non-Functional * End-to-end latency: **<100ms (ideal), <200ms (acceptable)** * No audible glitches or dropouts * Minimal CPU overhead * Robust against stream churn (apps opening/closing) --- ## Pain Points in Current Implementation * ~2 second delay caused by accumulated buffering * `module-loopback` introduces unpredictable latency * `parec` buffers aggressively by default * Multiple audio hops increase delay * Complex rerouting logic (polling + subscribe) --- ## Desired Output from the Model Provide a **detailed technical proposal** including: ### 1. Architecture Options Compare at least: * Improved PulseAudio approach (no loopbacks) * PipeWire-native solution * Hybrid compatibility approach ### 2. Recommended Architecture Include: * Audio routing diagram * How to exclude app audio cleanly * How to minimize buffering ### 3. Implementation Plan * Step-by-step migration from current system * Example commands / APIs * Electron integration approach ### 4. Latency Analysis * Where latency is introduced * Expected latency after improvements ### 5. Trade-offs * Compatibility vs performance * PulseAudio vs PipeWire * Complexity vs reliability ### 6. Optional Enhancements * Direct PipeWire API usage * WebRTC-native capture paths * Eliminating `parec` --- ## Success Criteria * Audio latency is reduced from ~2000ms → <100–200ms * No feedback loop occurs under any condition * Audio remains synchronized with video during screen sharing * System works reliably across multiple Linux distributions --- ## Notes * Current implementation already correctly separates app vs system audio * The main issue is latency, not correctness * A solution that simplifies the pipeline is strongly preferred --- ## Priority High — this directly impacts core real-time communication UX

myxelium added this to the Zoracord 1:1 project 2026-04-16 23:41:19 +00:00

myxelium moved this to To Do in Zoracord 1:1 on 2026-04-29 17:42:43 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: myxelium/Toju#12