VAD Silero
Overview
Section titled “Overview”The VAD Silero LOP is a lightweight, real-time Voice Activity Detection (VAD) operator. It uses the silero-vad
model from Silero AI to detect the presence of speech in a CHOP audio stream. This operator is highly efficient and designed for low-latency applications, making it ideal for triggering events based on whether someone is speaking or not.
The operator processes audio asynchronously and provides a simple “Is Speaking” toggle that reflects the current speech state.
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”- Input 1 (
Audio CHOP
): Connect a 16kHz single-channel (mono) audio CHOP here. The operator is specifically tuned for this sample rate.
Outputs
Section titled “Outputs”This operator has no direct outputs, as its state is exposed through its parameters (primarily the Isspeaking
toggle).
Parameters
Section titled “Parameters”Page: VAD Settings
Section titled “Page: VAD Settings”op('vad_silero').par.Isspeaking
Toggle - Default:
Off
op('vad_silero').par.Active
Toggle - Default:
Off
op('vad_silero').par.Modelready
Toggle - Default:
Off
op('vad_silero').par.Loadmodel
Pulse - Default:
None
op('vad_silero').par.Unloadmodel
Pulse - Default:
None
op('vad_silero').par.Autoloadoninit
Toggle - Default:
Off
op('vad_silero').par.Speechthreshold
Float - Default:
0.5
op('vad_silero').par.Minsilenceduration
Int - Default:
150
op('vad_silero').par.Speechpadding
Int - Default:
50
op('vad_silero').par.Downloadmodel
Pulse - Default:
None
Page: About
Section titled “Page: About”op('vad_silero').par.Bypass
Toggle - Default:
Off
op('vad_silero').par.Showbuiltin
Toggle - Default:
Off
op('vad_silero').par.Version
String - Default:
1.0.1
op('vad_silero').par.Lastupdated
String - Default:
2025-07-01
op('vad_silero').par.Creator
String - Default:
dotsimulate
op('vad_silero').par.Website
String - Default:
https://dotsimulate.com
op('vad_silero').par.Chattd
OP - Default:
None
Research & Licensing
Silero AI
Silero AI is a technology company specializing in speech recognition and voice processing solutions. They focus on creating enterprise-grade, production-ready speech models that are accessible to developers and researchers through open-source releases.
Silero VAD: Pre-trained Enterprise-grade Voice Activity Detector
Silero VAD is a pre-trained Voice Activity Detector designed for enterprise applications. It provides reliable speech detection capabilities with minimal computational requirements, making it ideal for real-time voice processing systems and applications requiring responsive voice activity detection.
Technical Details
- Lightweight Architecture: Optimized for real-time processing with minimal computational overhead
- 16kHz Audio Processing: Specifically tuned for 16kHz single-channel audio input
- PyTorch Implementation: Built on PyTorch framework with TorchHub integration
Research Impact
- Production-Ready VAD: Reliable voice activity detection for commercial applications
- Open Source Accessibility: Free alternative to commercial VAD solutions
- Real-time Performance: Enables low-latency voice processing applications
Citation
@misc{silero2024vad, author={Silero Team}, title={Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD)}, year={2024}, publisher={GitHub}, journal={GitHub repository}, howpublished={\url{https://github.com/snakers4/silero-vad}}, email={hello@silero.ai} }
Key Research Contributions
- Enterprise-grade Voice Activity Detection with high accuracy
- Real-time processing optimized for low-latency applications
- Pre-trained model requiring no additional training or fine-tuning
License
MIT License - This model is freely available for research and commercial use.