VAD Silero

v1.0.1

Overview

The VAD Silero LOP is a lightweight, real-time Voice Activity Detection (VAD) operator. It uses the silero-vad model from Silero AI to detect the presence of speech in a CHOP audio stream. This operator is highly efficient and designed for low-latency applications, making it ideal for triggering events based on whether someone is speaking or not.

The operator processes audio asynchronously and provides a simple “Is Speaking” toggle that reflects the current speech state.

Input/Output

Inputs

Input 1 (Audio CHOP): Connect a 16kHz single-channel (mono) audio CHOP here. The operator is specifically tuned for this sample rate.

Outputs

This operator has no direct outputs, as its state is exposed through its parameters (primarily the Isspeaking toggle).

Parameters

Page: VAD Settings

Is Speaking (Isspeaking) op('vad_silero').par.Isspeaking Toggle

Default:: Off

Active Monitor CHOPin1 (Active) op('vad_silero').par.Active Toggle

Default:: Off

Model Ready (Modelready) op('vad_silero').par.Modelready Toggle

Default:: Off

Load Model (Loadmodel) op('vad_silero').par.Loadmodel Pulse

Default:: None

Unload Model (Unloadmodel) op('vad_silero').par.Unloadmodel Pulse

Default:: None

Auto Load Model on Start (Autoloadoninit) op('vad_silero').par.Autoloadoninit Toggle

Default:: Off

Speech Threshold (Speechthreshold) op('vad_silero').par.Speechthreshold Float

Default:: 0.5

Min Silence Duration (ms) (Minsilenceduration) op('vad_silero').par.Minsilenceduration Int

Default:: 150

Speech Padding (ms) (Speechpadding) op('vad_silero').par.Speechpadding Int

Default:: 50

Download Model (Downloadmodel) op('vad_silero').par.Downloadmodel Pulse

Default:: None

Page: About

Bypass (Bypass) op('vad_silero').par.Bypass Toggle

Default:: Off

Show Built-in Parameters (Showbuiltin) op('vad_silero').par.Showbuiltin Toggle

Default:: Off

Version (Version) op('vad_silero').par.Version String

Default:: 1.0.1

Last Updated (Lastupdated) op('vad_silero').par.Lastupdated String

Default:: 2025-07-01

Creator (Creator) op('vad_silero').par.Creator String

Default:: dotsimulate

Website (Website) op('vad_silero').par.Website String

Default:: https://dotsimulate.com

ChatTD Operator (Chattd) op('vad_silero').par.Chattd OP

Default:: None

Research & Licensing

Silero AI

Silero AI is a technology company specializing in speech recognition and voice processing solutions. They focus on creating enterprise-grade, production-ready speech models that are accessible to developers and researchers through open-source releases.

Silero VAD: Pre-trained Enterprise-grade Voice Activity Detector

Silero VAD is a pre-trained Voice Activity Detector designed for enterprise applications. It provides reliable speech detection capabilities with minimal computational requirements, making it ideal for real-time voice processing systems and applications requiring responsive voice activity detection.

Technical Details

Lightweight Architecture: Optimized for real-time processing with minimal computational overhead
16kHz Audio Processing: Specifically tuned for 16kHz single-channel audio input
PyTorch Implementation: Built on PyTorch framework with TorchHub integration

Research Impact

Production-Ready VAD: Reliable voice activity detection for commercial applications
Open Source Accessibility: Free alternative to commercial VAD solutions
Real-time Performance: Enables low-latency voice processing applications

Citation

@misc{silero2024vad,
  author={Silero Team},
  title={Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD)},
  year={2024},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/snakers4/silero-vad}},
  email={hello@silero.ai}
}

Key Research Contributions

Enterprise-grade Voice Activity Detection with high accuracy
Real-time processing optimized for low-latency applications
Pre-trained model requiring no additional training or fine-tuning

License

MIT License - This model is freely available for research and commercial use.