STT Parakeet
The STT Parakeet LOP provides local speech-to-text transcription using NVIDIA’s Parakeet V3 models via a persistent worker-process architecture. It supports real-time streaming, push-to-talk, and file transcription with no cloud API required.
Key Features
Section titled “Key Features”- Fully local - No API keys or internet connection needed after model download
- Multilingual - Parakeet V3 supports 25 European languages with automatic detection
- CPU-optimized - Runs at approximately 5x real-time on a modern CPU; GPU provides minimal benefit
- Three operating modes - Stream (live), Push to Talk, and File Processing
- Worker subprocess - Runs inference in a separate process to avoid blocking TouchDesigner
- Reactive state channels - Built-in Script CHOP provides CHOP channels for transcription events, status, and result metadata
Requirements
Section titled “Requirements”- Dependencies -
onnx-asr,onnxruntime,huggingface-hub(install via the ‘Install Dependencies’ button on the ParakeetSTT page) - Model download - Approximately 500MB model downloaded from HuggingFace on first use via the ‘Download Model’ button
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”- Input 1 - Audio input CHOP (16kHz float32 mono). A Script CHOP inside the operator reads from this input and forwards audio chunks to the transcription engine.
Outputs
Section titled “Outputs”- Output 1 - Transcription text DAT containing the running transcript
- Output 2 - Segments table DAT with timestamped segments (Start, End, Text, Confidence, IsFinal, Speaker, Language). Enable ‘Output Segments (out1)’ to activate this output.
Usage Examples
Section titled “Usage Examples”Getting Started
Section titled “Getting Started”- On the ParakeetSTT page, select a model from the ‘Model’ menu. ‘Parakeet V3 Multilingual (600M) - Recommended’ is the default and best choice for most use cases.
- Set ‘Device’ to ‘CPU (Recommended)’ unless you have a specific need for GPU acceleration.
- Pulse ‘Install Dependencies’ if this is your first time using the operator. A TouchDesigner restart may be required after installation.
- Pulse ‘Download Model’ to download the selected model (approximately 500MB).
- Once downloaded, pulse ‘Initialize Engine’ to start the worker process.
- Wait for the ‘Engine Status’ field to show “Ready”.
Live Streaming Transcription
Section titled “Live Streaming Transcription”- Set ‘Operating Mode’ to ‘Stream (Live)’.
- Ensure the engine is initialized and shows “Ready” in the status field.
- Connect an audio source CHOP (16kHz mono) to the operator’s input.
- Toggle ‘Transcription Active’ to On.
- Audio will be transcribed in real-time as it arrives.
- Adjust ‘Chunk Duration (sec)’ to control how frequently audio chunks are sent for transcription. Lower values give more responsive results, higher values give more accurate transcription.
Push-to-Talk
Section titled “Push-to-Talk”- Set ‘Operating Mode’ to ‘Push to Talk’.
- Ensure the engine is initialized.
- Toggle ‘Transcription Active’ to On to start recording. Audio accumulates in an internal buffer.
- Speak your message.
- Toggle ‘Transcription Active’ to Off. The buffered audio is sent to the worker for transcription as a single chunk.
File Transcription
Section titled “File Transcription”- Set ‘Operating Mode’ to ‘File Processing’.
- Set ‘Transcription File’ to an audio or video file (wav, mp3, mp4, mkv, etc.).
- Toggle ‘Transcription Active’ to On to start processing.
- The transcript and segments will populate as the file is processed. The engine status shows the file name being transcribed.
- ‘Transcription Active’ automatically turns Off when file processing completes.
Reactive State Channels
Section titled “Reactive State Channels”The operator includes a Script CHOP that exposes internal state as CHOP channels for use in TouchDesigner networks. This enables driving animations, triggering events, or monitoring status without polling.
Pulse event channels - transcription_complete, empty_transcription, sentence_end fire as single-frame pulses when transcription results arrive, when a chunk produces no text, or when a sentence-ending punctuation mark is detected.
Status channels - worker_active, model_ready, transcription_active, download_in_progress, mode_stream, mode_pushtotalk, mode_file, active, ready reflect the current operating state.
Result data channels (optional) - last_has_segments, last_text_length, last_timestamp, last_mode_stream, last_mode_pushtotalk, last_mode_file provide metadata about the most recent transcription result.
Best Practices
Section titled “Best Practices”- Use CPU mode for Parakeet V3. It is specifically optimized for CPU inference and GPU provides negligible improvement.
- Enable ‘Initialize On Start’ if you want the engine to start automatically when your project loads.
- Use ‘Clear History’ before starting a new transcription session to reset the running transcript and segment table.
- Set ‘Worker Logging Level’ to ‘Info’ or ‘Debug’ when troubleshooting transcription issues. Keep it at ‘Off’ for production to minimize overhead.
- In Stream mode, a ‘Chunk Duration’ of around 2 seconds balances responsiveness and accuracy. Very short durations (under 1 second) may produce fragmented results.
Troubleshooting
Section titled “Troubleshooting”- Engine status stays at “Shutdown” - Pulse ‘Initialize Engine’. If the model is not yet downloaded, you will be prompted to download it first.
- “Worker not ready” when activating - The engine is still loading the model. Wait for ‘Engine Status’ to show “Ready” before toggling ‘Transcription Active’.
- No transcription output in Stream mode - Verify audio is connected to the operator’s input, the audio is 16kHz float32 mono, and ‘Transcription Active’ is On.
- Duplicate or repeated text - The operator has built-in segment deduplication. If you still see duplicates, try increasing ‘Chunk Duration’ to send larger audio blocks.
- Dependency installation fails - Ensure the ChatTD Python environment is configured. Check ChatTD logs for detailed installation errors. A TouchDesigner restart may be required after installing dependencies.
Parameters
Section titled “Parameters”ParakeetSTT
Section titled “ParakeetSTT”op('stt_parakeet').par.Installdependencies Pulse - Default:
False
op('stt_parakeet').par.Initialize Pulse - Default:
False
op('stt_parakeet').par.Shutdown Pulse - Default:
False
op('stt_parakeet').par.Downloadmodel Pulse - Default:
False
op('stt_parakeet').par.Initializeonstart Toggle - Default:
False
op('stt_parakeet').par.Enginestatus Str - Default:
"" (Empty String)
op('stt_parakeet').par.Downloadprogress Float - Default:
0.0- Range:
- 0 to 1
- Slider Range:
- 0 to 1
op('stt_parakeet').par.Active Toggle - Default:
False
op('stt_parakeet').par.Chunkduration Float - Default:
0.0- Range:
- 0.8 to 8
- Slider Range:
- 0.8 to 8
op('stt_parakeet').par.Cleartranscript Pulse - Default:
False
op('stt_parakeet').par.Copytranscript Pulse - Default:
False
op('stt_parakeet').par.Segments Toggle - Default:
False
op('stt_parakeet').par.Transcriptionfile File Audio or video file to transcribe (wav, mp3, mp4, mkv, etc.)
- Default:
"" (Empty String)
op('stt_parakeet').par.Processfile Pulse - Default:
False
Changelog
Section titled “Changelog”v1.0.02026-03-26
- Expand segments_out from 3 to 7 columns: add Confidence, IsFinal, Speaker, Language - Add header enforcement to segments_out on init - Align LastTranscriptionResult to standard schema: text, confidence, is_final, speaker, language, mode - Add parakeet_channels_scriptchop.py Script CHOP for dependency channel monitoring
- Initial commit