Skip to content
  1. OPERATORS
  2. PIPELINES

STT

v1.0.0new

stt turns incoming audio into transcripts, streaming partials, or wake-word events through swappable speech providers. Use it for push-to-talk transcription, live captions, or wake detection in voice-driven TouchDesigner systems.

The operator discovers built-in and custom STT providers, loads the selected Provider, creates provider-specific parameters, and routes audio to the selected backend. Local providers run through the speech worker template; websocket and API providers use their own connection paths. Results are normalized into final transcriptions, partial text, wake events, and callback fires.

Mode is filtered by provider. Transcription providers expose Push to Talk and/or Streaming, while wake-word providers expose Wake Word when supported.

  1. Wire a mono Audio CHOP into input 1.
  2. Choose Provider, then pulse Install Dependencies or Download Model if the selected provider needs local setup.
  3. Pulse Initialize Engine and wait for Engine Status to report readiness.
  4. Choose Mode. Use Push to Talk for buffered utterances, Streaming for partial captions, or Wake Word when available.
  5. Turn Active on while audio should be captured. In Push to Talk, turning Active off flushes the buffered utterance for transcription.
  6. Read the transcript output DAT, partials_out, or wake_events, and use Clear Transcript when starting a new session.
  • Input 1: Audio CHOP, typically from Audio Device In or voice_activity.
  • Output 1: Transcription text DAT with final transcript text, or live text when Transcript Output includes partials.
  • Output 2: Audio passthrough CHOP.
  • Output 3: Partials and wake-event output surface for streaming or wake detection workflows.
  • voice_activity: Gates microphone audio before transcription.
  • tts: Completes speech-in / speech-out voice loops.
  • voice_realtime: Pairs STT events with realtime conversation flows.
  • flow_router: Routes transcript, partial, or wake events to downstream actions.
  • Provider changes can replace provider-specific pages and reset Mode if the previous mode is unsupported.
  • Local providers need dependencies and model downloads before first use. Cloud providers need API keys through the provider page or ChatTD key handling.
  • Active can auto-initialize the engine if it is not ready, but failed initialization leaves no useful transcription; check Engine Status and logs.
  • In Push to Talk mode, final transcription is emitted when Active turns off.
  • Callback DAT hooks include transcription completion, partials, wake detection, speech start/end, provider/mode changes, and errors.
Dependencies OK (Installdependencies) op('stt').par.Installdependencies Pulse
Default:
False
Initialize Engine (Initialize) op('stt').par.Initialize Pulse
Default:
False
Shutdown Engine (Shutdown) op('stt').par.Shutdown Pulse
Default:
False
Engine Status (Enginestatus) op('stt').par.Enginestatus Str
Default:
"" (Empty String)
Active (Active) op('stt').par.Active Toggle
Default:
False
Clear Transcript (Cleartranscript) op('stt').par.Cleartranscript Pulse
Default:
False
Copy to Clipboard (Copytranscript) op('stt').par.Copytranscript Pulse
Default:
False
Initialize On Start (Initializeonstart) op('stt').par.Initializeonstart Toggle
Default:
False
Download Model (Downloadmodel) op('stt').par.Downloadmodel Pulse
Default:
False
Worker Log Level (Workerlogging) op('stt').par.Workerlogging Menu
Default:
DEBUG
Options:
OFF, CRITICAL, ERROR, WARNING, INFO, DEBUG
IPC Mode (Ipcmode) op('stt').par.Ipcmode Menu
Default:
tcp
Options:
tcp, stdio
Monitor Worker Logs (Monitorworkerlogs) op('stt').par.Monitorworkerlogs Toggle
Default:
True
Auto Reattach (Autoreattachoninit) op('stt').par.Autoreattachoninit Toggle
Default:
True
Force Attach (Skip PID) (Forceattachoninit) op('stt').par.Forceattachoninit Toggle
Default:
False
Speech Venv (Speechvenv) op('stt').par.Speechvenv Folder
Default:
"" (Empty String)
Provider (Provider) op('stt').par.Provider StrMenu
Default:
moonshine
Menu Options:
  • AssemblyAI (assemblyai)
  • Faster-Whisper (faster_whisper)
  • Moonshine (moonshine)
  • Parakeet (parakeet)
  • EfficientWord-Net (efficientword)
  • Porcupine (porcupine)
  • microWakeWord (micro_wakeword)
  • openWakeWord (openwakeword)
Scan Providers (Scanproviders) op('stt').par.Scanproviders Pulse
Default:
False
Custom Providers Folder (Providersfolder) op('stt').par.Providersfolder Folder
Default:
"" (Empty String)
Mode (Mode) op('stt').par.Mode Menu

transcribe = batch utterances, stream = continuous partials, wake_detect = keyword spotting

Default:
transcribe
Options:
transcribe, stream
Chunk Duration (s) (Chunkduration) op('stt').par.Chunkduration Float

Seconds of audio sent to the worker per chunk

Default:
2.0
Range:
0.1 to 10
Smart Chunking (Smartchunking) op('stt').par.Smartchunking Toggle
Default:
True
Max Chunk Duration (s) (Maxchunkduration) op('stt').par.Maxchunkduration Float
Default:
8.0
Range:
1 to 30
Pause Sensitivity (Pausesensitivity) op('stt').par.Pausesensitivity Float
Default:
0.5
Range:
0 to 1
Transcript Output (Transcriptmode) op('stt').par.Transcriptmode Menu
Default:
live
Options:
finals, live
Model (Modelarch) op('stt').par.Modelarch Menu
Default:
base
Options:
tiny, base, tiny-streaming, small-streaming, medium-streaming
Language (Language) op('stt').par.Language Menu
Default:
en
Options:
en, ar, ja, ko, zh, es, uk, vi
Callbacks DAT (Callbackdat) op('stt').par.Callbackdat DAT
Default:
./emptyCallbacks
Print Callbacks (Printcallbacks) op('stt').par.Printcallbacks Toggle
Default:
True
v1.0.02026-05-02
  • added docs/compose.json
  • fixed auto-init race condition (Active toggle no longer resets during pending init) - deferred Initialize pulse to run() to avoid re-entrancy on Active toggle - updated worker subprocess entry points across 7 providers - bumped version to 1.0.0 - updated category to Pipelines
  • added Transcriptmode parameter (finals/live) - added provider menu sorting (transcription first, wake-word last) - added Active=off on shutdown for constant-mode pars - added committed text tracking for live transcript mode
  • set release_level to prod
  • add EfficientWord-Net wake-word provider and worker - add micro_wakeword provider and worker
  • unified stt operator on util-speech-template with provider registry - STTEXT with transcribe/stream/wake_detect modes and rolling transcript append - 6 providers: faster_whisper, moonshine, parakeet, openwakeword, porcupine, assemblyai - 5 subprocess workers targeting speech_sidecar venv - assemblyai websocket provider with v3 universal streaming, sub-chunk pacing, speaker labels, keyterms prompt - porcupine wake-word provider with 16 built-in keywords and custom .ppn support - wake-word output surface: wake_events table, WakeDetected deps, per-keyword debounce - per-provider signals surface on signals Script CHOP via SIGNAL_CHANNELS declaration - Callbacks page and STTCallbacks class with onPartial, onWakeDetected, onSpeechStart/End, onProviderChange hooks - real is_model_available cache detection for moonshine, parakeet, openwakeword - websocket branch scaffold ready for additional cloud providers - manifest extends util-speech-template and util-chained-callbacks, category speech
  • Initial stt structure
  • Initial commit