STT Parakeet

v1.0.0New

The STT Parakeet LOP provides local speech-to-text transcription using NVIDIA’s Parakeet V3 models via a persistent worker-process architecture. It supports real-time streaming, push-to-talk, and file transcription with no cloud API required.

Key Features

Fully local - No API keys or internet connection needed after model download
Multilingual - Parakeet V3 supports 25 European languages with automatic detection
CPU-optimized - Runs at approximately 5x real-time on a modern CPU; GPU provides minimal benefit
Three operating modes - Stream (live), Push to Talk, and File Processing
Worker subprocess - Runs inference in a separate process to avoid blocking TouchDesigner
Reactive state channels - Built-in Script CHOP provides CHOP channels for transcription events, status, and result metadata

Requirements

Dependencies - onnx-asr, onnxruntime, huggingface-hub (install via the ‘Install Dependencies’ button on the ParakeetSTT page)
Model download - Approximately 500MB model downloaded from HuggingFace on first use via the ‘Download Model’ button

Input/Output

Inputs

Input 1 - Audio input CHOP (16kHz float32 mono). A Script CHOP inside the operator reads from this input and forwards audio chunks to the transcription engine.

Outputs

Output 1 - Transcription text DAT containing the running transcript
Output 2 - Segments table DAT with timestamped segments (Start, End, Text, Confidence, IsFinal, Speaker, Language). Enable ‘Output Segments (out1)’ to activate this output.

Usage Examples

Getting Started

On the ParakeetSTT page, select a model from the ‘Model’ menu. ‘Parakeet V3 Multilingual (600M) - Recommended’ is the default and best choice for most use cases.
Set ‘Device’ to ‘CPU (Recommended)’ unless you have a specific need for GPU acceleration.
Pulse ‘Install Dependencies’ if this is your first time using the operator. A TouchDesigner restart may be required after installation.
Pulse ‘Download Model’ to download the selected model (approximately 500MB).
Once downloaded, pulse ‘Initialize Engine’ to start the worker process.
Wait for the ‘Engine Status’ field to show “Ready”.

Live Streaming Transcription

Set ‘Operating Mode’ to ‘Stream (Live)’.
Ensure the engine is initialized and shows “Ready” in the status field.
Connect an audio source CHOP (16kHz mono) to the operator’s input.
Toggle ‘Transcription Active’ to On.
Audio will be transcribed in real-time as it arrives.
Adjust ‘Chunk Duration (sec)’ to control how frequently audio chunks are sent for transcription. Lower values give more responsive results, higher values give more accurate transcription.

Push-to-Talk

Set ‘Operating Mode’ to ‘Push to Talk’.
Ensure the engine is initialized.
Toggle ‘Transcription Active’ to On to start recording. Audio accumulates in an internal buffer.
Speak your message.
Toggle ‘Transcription Active’ to Off. The buffered audio is sent to the worker for transcription as a single chunk.

File Transcription

Set ‘Operating Mode’ to ‘File Processing’.
Set ‘Transcription File’ to an audio or video file (wav, mp3, mp4, mkv, etc.).
Toggle ‘Transcription Active’ to On to start processing.
The transcript and segments will populate as the file is processed. The engine status shows the file name being transcribed.
‘Transcription Active’ automatically turns Off when file processing completes.

Reactive State Channels

The operator includes a Script CHOP that exposes internal state as CHOP channels for use in TouchDesigner networks. This enables driving animations, triggering events, or monitoring status without polling.

Pulse event channels - transcription_complete, empty_transcription, sentence_end fire as single-frame pulses when transcription results arrive, when a chunk produces no text, or when a sentence-ending punctuation mark is detected.

Status channels - worker_active, model_ready, transcription_active, download_in_progress, mode_stream, mode_pushtotalk, mode_file, active, ready reflect the current operating state.

Result data channels (optional) - last_has_segments, last_text_length, last_timestamp, last_mode_stream, last_mode_pushtotalk, last_mode_file provide metadata about the most recent transcription result.

Best Practices

Use CPU mode for Parakeet V3. It is specifically optimized for CPU inference and GPU provides negligible improvement.
Enable ‘Initialize On Start’ if you want the engine to start automatically when your project loads.
Use ‘Clear History’ before starting a new transcription session to reset the running transcript and segment table.
Set ‘Worker Logging Level’ to ‘Info’ or ‘Debug’ when troubleshooting transcription issues. Keep it at ‘Off’ for production to minimize overhead.
In Stream mode, a ‘Chunk Duration’ of around 2 seconds balances responsiveness and accuracy. Very short durations (under 1 second) may produce fragmented results.

Troubleshooting

Engine status stays at “Shutdown” - Pulse ‘Initialize Engine’. If the model is not yet downloaded, you will be prompted to download it first.
“Worker not ready” when activating - The engine is still loading the model. Wait for ‘Engine Status’ to show “Ready” before toggling ‘Transcription Active’.
No transcription output in Stream mode - Verify audio is connected to the operator’s input, the audio is 16kHz float32 mono, and ‘Transcription Active’ is On.
Duplicate or repeated text - The operator has built-in segment deduplication. If you still see duplicates, try increasing ‘Chunk Duration’ to send larger audio blocks.
Dependency installation fails - Ensure the ChatTD Python environment is configured. Check ChatTD logs for detailed installation errors. A TouchDesigner restart may be required after installing dependencies.

Parameters

ParakeetSTT

Model (Modelsize) op('stt_parakeet').par.Modelsize StrMenu

Select Parakeet model. V3 supports 25 European languages with auto-detection.

Default:

"" (Empty String)

Menu Options:

Parakeet V3 Multilingual (600M) - Recommended (nemo-parakeet-tdt-0.6b-v3)
Parakeet V2 English-only (600M) (nemo-parakeet-tdt-0.6b-v2)

Install Dependencies (Installdependencies) op('stt_parakeet').par.Installdependencies Pulse

Default:: False

Initialize Engine (Initialize) op('stt_parakeet').par.Initialize Pulse

Default:: False

Shutdown Engine (Shutdown) op('stt_parakeet').par.Shutdown Pulse

Default:: False

Download Model (Downloadmodel) op('stt_parakeet').par.Downloadmodel Pulse

Default:: False

Initialize On Start (Initializeonstart) op('stt_parakeet').par.Initializeonstart Toggle

Default:: False

Engine Status (Enginestatus) op('stt_parakeet').par.Enginestatus Str

Default:: "" (Empty String)

Download Progress (Downloadprogress) op('stt_parakeet').par.Downloadprogress Float

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

Transcription Active (Active) op('stt_parakeet').par.Active Toggle

Default:: False

Chunk Duration (sec) (Chunkduration) op('stt_parakeet').par.Chunkduration Float

Default:: 0.0
Range:: 0.8 to 8
Slider Range:: 0.8 to 8

Clear History (Cleartranscript) op('stt_parakeet').par.Cleartranscript Pulse

Default:: False

Copy Transcript to Clipboard (Copytranscript) op('stt_parakeet').par.Copytranscript Pulse

Default:: False

Output Segments (out1) (Segments) op('stt_parakeet').par.Segments Toggle

Default:: False

Transcription File (Transcriptionfile) op('stt_parakeet').par.Transcriptionfile File

Audio or video file to transcribe (wav, mp3, mp4, mkv, etc.)

Default:: "" (Empty String)

Process File (Processfile) op('stt_parakeet').par.Processfile Pulse

Default:: False

Changelog

v1.0.02026-03-26

Expand segments_out from 3 to 7 columns: add Confidence, IsFinal, Speaker, Language - Add header enforcement to segments_out on init - Align LastTranscriptionResult to standard schema: text, confidence, is_final, speaker, language, mode - Add parakeet_channels_scriptchop.py Script CHOP for dependency channel monitoring
Initial commit