Skip to content
  1. OPERATORS
  2. PIPELINES

Voice Activity

v1.0.0new

voice_activity processes microphone audio into speech-state signals for voice workflows. Use it when a patch needs echo cancellation, speech start/end detection, and optional Smart Turn end-of-turn classification before sending audio to STT or an agent loop.

The operator loads a pipeline with Silero VAD, optional LiveKit echo cancellation/audio processing, and optional Smart Turn v3. While Active is on, incoming audio chunks are queued, processed, and converted into CHOP-observable state: speaking, speech start, speech end, turn complete, Smart Turn probability, and latency metrics.

Echo Cancellation uses input 2 as reference speaker/TTS audio. Smart Turn checks whether a pause likely means the user has finished speaking, reducing mid-sentence cutoffs in voice interfaces.

  1. Wire microphone audio to input 1. If Echo Cancellation is enabled, wire speaker or TTS reference audio to input 2.
  2. Pulse Install Dependencies once if the required Python packages are missing.
  3. Pulse Load Pipeline, or leave Auto Load on Init enabled and wait for Pipeline Ready.
  4. Tune Speech Threshold, Min Silence, and Speech Pad for the microphone and room.
  5. Enable Smart Turn when semantic end-of-turn detection is useful, then adjust Turn Threshold and Turn Silence.
  6. Turn Active on and monitor Is Speaking, Smart Turn Probability, and downstream CHOP flags.
  • Input 1: Mono microphone audio CHOP, expected at the processing sample rate.
  • Input 2: Optional reference audio CHOP for Echo Cancellation.
  • Output 1: Processed audio CHOP after the enabled audio-processing stages.
  • Output 2: Status and metrics CHOP.
  • Output 3: Speaking and turn-complete flag CHOP.
  • stt: Receives gated/processed microphone audio and speech boundary signals.
  • tts: Supplies reference audio for echo cancellation in speaker playback setups.
  • agent: Uses turn-complete signals to decide when to respond.
  • flow_router: Routes speech start/end and Smart Turn events.
  • Load Pipeline must succeed before Active does useful work.
  • Echo Cancellation needs reference audio on input 2. Without that signal, enabling it cannot remove speaker echo.
  • Smart Turn can add a small amount of end-of-turn latency in exchange for fewer premature cutoffs.
  • First Smart Turn load may download/cache ONNX and feature-extractor assets.
  • The operator replaces older Silero-only VAD workflows; avoid running both on the same microphone path.
Pipelinestatus (Pipelinestatus) op('voice_activity').par.Pipelinestatus Str
Default:
"" (Empty String)
Pipeline Header
Active (Active) op('voice_activity').par.Active Toggle
Default:
False
Auto Load on Init (Autoloadoninit) op('voice_activity').par.Autoloadoninit Toggle
Default:
True
Load Pipeline (Loadpipeline) op('voice_activity').par.Loadpipeline Pulse
Default:
False
Unload Pipeline (Unloadpipeline) op('voice_activity').par.Unloadpipeline Pulse
Default:
False
Pipeline Ready (Pipelineready) op('voice_activity').par.Pipelineready Toggle
Default:
False
Is Speaking (Isspeaking) op('voice_activity').par.Isspeaking Toggle
Default:
False
Install Dependencies (Installdependencies) op('voice_activity').par.Installdependencies Pulse
Default:
False
Audio Processing Header
Echo Cancellation (Enableaec) op('voice_activity').par.Enableaec Toggle
Default:
True
Noise Suppression (Enablenoisesuppression) op('voice_activity').par.Enablenoisesuppression Toggle
Default:
False
Auto Gain Control (Enableautogaincontrol) op('voice_activity').par.Enableautogaincontrol Toggle
Default:
False
High-Pass Filter (Enablehighpassfilter) op('voice_activity').par.Enablehighpassfilter Toggle
Default:
False
VAD (Silero) Header
Speech Threshold (Speechthreshold) op('voice_activity').par.Speechthreshold Float
Default:
0.81
Range:
0 to 1
Min Silence (ms) (Minsilenceduration) op('voice_activity').par.Minsilenceduration Int
Default:
508
Range:
0 to 2000
Speech Pad (ms) (Speechpadding) op('voice_activity').par.Speechpadding Int
Default:
242
Range:
0 to 500
Smart Turn Header
Smart Turn (Enablesmartturn) op('voice_activity').par.Enablesmartturn Toggle
Default:
True
Turn Threshold (Smartturnthreshold) op('voice_activity').par.Smartturnthreshold Float
Default:
0.333
Range:
0 to 1
Turn Max Audio (sec) (Smartturnmaxaudio) op('voice_activity').par.Smartturnmaxaudio Float
Default:
8.0
Range:
1 to 30
Turn Probability (Smartturnprob) op('voice_activity').par.Smartturnprob Float
Default:
0.0
Range:
0 to 1
Smart Turn Ready (Smartturnready) op('voice_activity').par.Smartturnready Toggle
Default:
False
Turn Silence (ms) (Smartturnsilence) op('voice_activity').par.Smartturnsilence Int
Default:
1000
Range:
100 to 3000
v1.0.02026-05-02
  • updated manifest category to 0.3.0 group taxonomy
  • Initial voice_activity structure