Skip to content

STT Parakeet

v1.0.0New

The STT Parakeet LOP provides local speech-to-text transcription using NVIDIA’s Parakeet V3 models via a persistent worker-process architecture. It supports real-time streaming, push-to-talk, and file transcription with no cloud API required.

  • Fully local - No API keys or internet connection needed after model download
  • Multilingual - Parakeet V3 supports 25 European languages with automatic detection
  • CPU-optimized - Runs at approximately 5x real-time on a modern CPU; GPU provides minimal benefit
  • Three operating modes - Stream (live), Push to Talk, and File Processing
  • Worker subprocess - Runs inference in a separate process to avoid blocking TouchDesigner
  • Reactive state channels - Built-in Script CHOP provides CHOP channels for transcription events, status, and result metadata
  • Dependencies - onnx-asr, onnxruntime, huggingface-hub (install via the ‘Install Dependencies’ button on the ParakeetSTT page)
  • Model download - Approximately 500MB model downloaded from HuggingFace on first use via the ‘Download Model’ button
  • Input 1 - Audio input CHOP (16kHz float32 mono). A Script CHOP inside the operator reads from this input and forwards audio chunks to the transcription engine.
  • Output 1 - Transcription text DAT containing the running transcript
  • Output 2 - Segments table DAT with timestamped segments (Start, End, Text, Confidence, IsFinal, Speaker, Language). Enable ‘Output Segments (out1)’ to activate this output.
  1. On the ParakeetSTT page, select a model from the ‘Model’ menu. ‘Parakeet V3 Multilingual (600M) - Recommended’ is the default and best choice for most use cases.
  2. Set ‘Device’ to ‘CPU (Recommended)’ unless you have a specific need for GPU acceleration.
  3. Pulse ‘Install Dependencies’ if this is your first time using the operator. A TouchDesigner restart may be required after installation.
  4. Pulse ‘Download Model’ to download the selected model (approximately 500MB).
  5. Once downloaded, pulse ‘Initialize Engine’ to start the worker process.
  6. Wait for the ‘Engine Status’ field to show “Ready”.
  1. Set ‘Operating Mode’ to ‘Stream (Live)’.
  2. Ensure the engine is initialized and shows “Ready” in the status field.
  3. Connect an audio source CHOP (16kHz mono) to the operator’s input.
  4. Toggle ‘Transcription Active’ to On.
  5. Audio will be transcribed in real-time as it arrives.
  6. Adjust ‘Chunk Duration (sec)’ to control how frequently audio chunks are sent for transcription. Lower values give more responsive results, higher values give more accurate transcription.
  1. Set ‘Operating Mode’ to ‘Push to Talk’.
  2. Ensure the engine is initialized.
  3. Toggle ‘Transcription Active’ to On to start recording. Audio accumulates in an internal buffer.
  4. Speak your message.
  5. Toggle ‘Transcription Active’ to Off. The buffered audio is sent to the worker for transcription as a single chunk.
  1. Set ‘Operating Mode’ to ‘File Processing’.
  2. Set ‘Transcription File’ to an audio or video file (wav, mp3, mp4, mkv, etc.).
  3. Toggle ‘Transcription Active’ to On to start processing.
  4. The transcript and segments will populate as the file is processed. The engine status shows the file name being transcribed.
  5. ‘Transcription Active’ automatically turns Off when file processing completes.

The operator includes a Script CHOP that exposes internal state as CHOP channels for use in TouchDesigner networks. This enables driving animations, triggering events, or monitoring status without polling.

Pulse event channels - transcription_complete, empty_transcription, sentence_end fire as single-frame pulses when transcription results arrive, when a chunk produces no text, or when a sentence-ending punctuation mark is detected.

Status channels - worker_active, model_ready, transcription_active, download_in_progress, mode_stream, mode_pushtotalk, mode_file, active, ready reflect the current operating state.

Result data channels (optional) - last_has_segments, last_text_length, last_timestamp, last_mode_stream, last_mode_pushtotalk, last_mode_file provide metadata about the most recent transcription result.

  • Use CPU mode for Parakeet V3. It is specifically optimized for CPU inference and GPU provides negligible improvement.
  • Enable ‘Initialize On Start’ if you want the engine to start automatically when your project loads.
  • Use ‘Clear History’ before starting a new transcription session to reset the running transcript and segment table.
  • Set ‘Worker Logging Level’ to ‘Info’ or ‘Debug’ when troubleshooting transcription issues. Keep it at ‘Off’ for production to minimize overhead.
  • In Stream mode, a ‘Chunk Duration’ of around 2 seconds balances responsiveness and accuracy. Very short durations (under 1 second) may produce fragmented results.
  • Engine status stays at “Shutdown” - Pulse ‘Initialize Engine’. If the model is not yet downloaded, you will be prompted to download it first.
  • “Worker not ready” when activating - The engine is still loading the model. Wait for ‘Engine Status’ to show “Ready” before toggling ‘Transcription Active’.
  • No transcription output in Stream mode - Verify audio is connected to the operator’s input, the audio is 16kHz float32 mono, and ‘Transcription Active’ is On.
  • Duplicate or repeated text - The operator has built-in segment deduplication. If you still see duplicates, try increasing ‘Chunk Duration’ to send larger audio blocks.
  • Dependency installation fails - Ensure the ChatTD Python environment is configured. Check ChatTD logs for detailed installation errors. A TouchDesigner restart may be required after installing dependencies.
Operating Mode (Mode) op('stt_parakeet').par.Mode Menu
Default:
Stream
Options:
Stream, Pushtotalk, File
Model (Modelsize) op('stt_parakeet').par.Modelsize StrMenu

Select Parakeet model. V3 supports 25 European languages with auto-detection.

Default:
"" (Empty String)
Menu Options:
  • Parakeet V3 Multilingual (600M) - Recommended (nemo-parakeet-tdt-0.6b-v3)
  • Parakeet V2 English-only (600M) (nemo-parakeet-tdt-0.6b-v2)
Device (Device) op('stt_parakeet').par.Device Menu

Parakeet V3 is CPU-optimized (~5x real-time on i5). GPU provides minimal benefit.

Default:
cpu
Options:
cpu, cuda
Install Dependencies (Installdependencies) op('stt_parakeet').par.Installdependencies Pulse
Default:
False
Initialize Engine (Initialize) op('stt_parakeet').par.Initialize Pulse
Default:
False
Shutdown Engine (Shutdown) op('stt_parakeet').par.Shutdown Pulse
Default:
False
Download Model (Downloadmodel) op('stt_parakeet').par.Downloadmodel Pulse
Default:
False
Initialize On Start (Initializeonstart) op('stt_parakeet').par.Initializeonstart Toggle
Default:
False
Engine Status (Enginestatus) op('stt_parakeet').par.Enginestatus Str
Default:
"" (Empty String)
Download Progress (Downloadprogress) op('stt_parakeet').par.Downloadprogress Float
Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Worker Logging Level (Workerlogging) op('stt_parakeet').par.Workerlogging Menu
Default:
OFF
Options:
OFF, CRITICAL, ERROR, WARNING, INFO, DEBUG
Transcription Active (Active) op('stt_parakeet').par.Active Toggle
Default:
False
Chunk Duration (sec) (Chunkduration) op('stt_parakeet').par.Chunkduration Float
Default:
0.0
Range:
0.8 to 8
Slider Range:
0.8 to 8
Clear History (Cleartranscript) op('stt_parakeet').par.Cleartranscript Pulse
Default:
False
Copy Transcript to Clipboard (Copytranscript) op('stt_parakeet').par.Copytranscript Pulse
Default:
False
Output Segments (out1) (Segments) op('stt_parakeet').par.Segments Toggle
Default:
False
Transcription File (Transcriptionfile) op('stt_parakeet').par.Transcriptionfile File

Audio or video file to transcribe (wav, mp3, mp4, mkv, etc.)

Default:
"" (Empty String)
Process File (Processfile) op('stt_parakeet').par.Processfile Pulse
Default:
False
v1.0.02026-03-26
  • Expand segments_out from 3 to 7 columns: add Confidence, IsFinal, Speaker, Language - Add header enforcement to segments_out on init - Align LastTranscriptionResult to standard schema: text, confidence, is_final, speaker, language, mode - Add parakeet_channels_scriptchop.py Script CHOP for dependency channel monitoring
  • Initial commit