Skip to content

VAD Silero

The VAD Silero LOP is a lightweight, real-time Voice Activity Detection (VAD) operator. It uses the silero-vad model from Silero AI to detect the presence of speech in a CHOP audio stream. This operator is highly efficient and designed for low-latency applications, making it ideal for triggering events based on whether someone is speaking or not.

The operator processes audio asynchronously and provides a simple “Is Speaking” toggle that reflects the current speech state.

  • Input 1 (Audio CHOP): Connect a 16kHz single-channel (mono) audio CHOP here. The operator is specifically tuned for this sample rate.

This operator has no direct outputs, as its state is exposed through its parameters (primarily the Isspeaking toggle).

Is Speaking (Isspeaking) op('vad_silero').par.Isspeaking Toggle
Default:
Off
Active Monitor CHOPin1 (Active) op('vad_silero').par.Active Toggle
Default:
Off
Model Ready (Modelready) op('vad_silero').par.Modelready Toggle
Default:
Off
Load Model (Loadmodel) op('vad_silero').par.Loadmodel Pulse
Default:
None
Unload Model (Unloadmodel) op('vad_silero').par.Unloadmodel Pulse
Default:
None
Auto Load Model on Start (Autoloadoninit) op('vad_silero').par.Autoloadoninit Toggle
Default:
Off
Speech Threshold (Speechthreshold) op('vad_silero').par.Speechthreshold Float
Default:
0.5
Min Silence Duration (ms) (Minsilenceduration) op('vad_silero').par.Minsilenceduration Int
Default:
150
Speech Padding (ms) (Speechpadding) op('vad_silero').par.Speechpadding Int
Default:
50
Download Model (Downloadmodel) op('vad_silero').par.Downloadmodel Pulse
Default:
None
Bypass (Bypass) op('vad_silero').par.Bypass Toggle
Default:
Off
Show Built-in Parameters (Showbuiltin) op('vad_silero').par.Showbuiltin Toggle
Default:
Off
Version (Version) op('vad_silero').par.Version String
Default:
1.0.1
Last Updated (Lastupdated) op('vad_silero').par.Lastupdated String
Default:
2025-07-01
Creator (Creator) op('vad_silero').par.Creator String
Default:
dotsimulate
Website (Website) op('vad_silero').par.Website String
Default:
https://dotsimulate.com
ChatTD Operator (Chattd) op('vad_silero').par.Chattd OP
Default:
None

Research & Licensing

Silero AI

Silero AI is a technology company specializing in speech recognition and voice processing solutions. They focus on creating enterprise-grade, production-ready speech models that are accessible to developers and researchers through open-source releases.

Silero VAD: Pre-trained Enterprise-grade Voice Activity Detector

Silero VAD is a pre-trained Voice Activity Detector designed for enterprise applications. It provides reliable speech detection capabilities with minimal computational requirements, making it ideal for real-time voice processing systems and applications requiring responsive voice activity detection.

Technical Details

  • Lightweight Architecture: Optimized for real-time processing with minimal computational overhead
  • 16kHz Audio Processing: Specifically tuned for 16kHz single-channel audio input
  • PyTorch Implementation: Built on PyTorch framework with TorchHub integration

Research Impact

  • Production-Ready VAD: Reliable voice activity detection for commercial applications
  • Open Source Accessibility: Free alternative to commercial VAD solutions
  • Real-time Performance: Enables low-latency voice processing applications

Citation

@misc{silero2024vad,
  author={Silero Team},
  title={Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD)},
  year={2024},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/snakers4/silero-vad}},
  email={hello@silero.ai}
}

Key Research Contributions

  • Enterprise-grade Voice Activity Detection with high accuracy
  • Real-time processing optimized for low-latency applications
  • Pre-trained model requiring no additional training or fine-tuning

License

MIT License - This model is freely available for research and commercial use.