OPERATORS
PIPELINES

STT

v1.0.0new

stt turns incoming audio into transcripts, streaming partials, or wake-word events through swappable speech providers. Use it for push-to-talk transcription, live captions, or wake detection in voice-driven TouchDesigner systems.

What It Does

The operator discovers built-in and custom STT providers, loads the selected Provider, creates provider-specific parameters, and routes audio to the selected backend. Local providers run through the speech worker template; websocket and API providers use their own connection paths. Results are normalized into final transcriptions, partial text, wake events, and callback fires.

Mode is filtered by provider. Transcription providers expose Push to Talk and/or Streaming, while wake-word providers expose Wake Word when supported.

Typical Workflow

Wire a mono Audio CHOP into input 1.
Choose Provider, then pulse Install Dependencies or Download Model if the selected provider needs local setup.
Pulse Initialize Engine and wait for Engine Status to report readiness.
Choose Mode. Use Push to Talk for buffered utterances, Streaming for partial captions, or Wake Word when available.
Turn Active on while audio should be captured. In Push to Talk, turning Active off flushes the buffered utterance for transcription.
Read the transcript output DAT, partials_out, or wake_events, and use Clear Transcript when starting a new session.

Inputs And Outputs

Input 1: Audio CHOP, typically from Audio Device In or voice_activity.
Output 1: Transcription text DAT with final transcript text, or live text when Transcript Output includes partials.
Output 2: Audio passthrough CHOP.
Output 3: Partials and wake-event output surface for streaming or wake detection workflows.

Works Well With

voice_activity: Gates microphone audio before transcription.
tts: Completes speech-in / speech-out voice loops.
voice_realtime: Pairs STT events with realtime conversation flows.
flow_router: Routes transcript, partial, or wake events to downstream actions.

Gotchas

Provider changes can replace provider-specific pages and reset Mode if the previous mode is unsupported.
Local providers need dependencies and model downloads before first use. Cloud providers need API keys through the provider page or ChatTD key handling.
Active can auto-initialize the engine if it is not ready, but failed initialization leaves no useful transcription; check Engine Status and logs.
In Push to Talk mode, final transcription is emitted when Active turns off.
Callback DAT hooks include transcription completion, partials, wake detection, speech start/end, provider/mode changes, and errors.

Parameters

STT

Dependencies OK (Installdependencies) op('stt').par.Installdependencies Pulse

Default:: False

Initialize Engine (Initialize) op('stt').par.Initialize Pulse

Default:: False

Shutdown Engine (Shutdown) op('stt').par.Shutdown Pulse

Default:: False

Engine Status (Enginestatus) op('stt').par.Enginestatus Str

Default:: "" (Empty String)

Active (Active) op('stt').par.Active Toggle

Default:: False

Clear Transcript (Cleartranscript) op('stt').par.Cleartranscript Pulse

Default:: False

Copy to Clipboard (Copytranscript) op('stt').par.Copytranscript Pulse

Default:: False

Initialize On Start (Initializeonstart) op('stt').par.Initializeonstart Toggle

Default:: False

Download Model (Downloadmodel) op('stt').par.Downloadmodel Pulse

Default:: False

Monitor Worker Logs (Monitorworkerlogs) op('stt').par.Monitorworkerlogs Toggle

Default:: True

Auto Reattach (Autoreattachoninit) op('stt').par.Autoreattachoninit Toggle

Default:: True

Force Attach (Skip PID) (Forceattachoninit) op('stt').par.Forceattachoninit Toggle

Default:: False

Speech Venv (Speechvenv) op('stt').par.Speechvenv Folder

Default:: "" (Empty String)

Provider (Provider) op('stt').par.Provider StrMenu

Default:

moonshine

Menu Options:

AssemblyAI (assemblyai)
Faster-Whisper (faster_whisper)
Moonshine (moonshine)
Parakeet (parakeet)
EfficientWord-Net (efficientword)
Porcupine (porcupine)
microWakeWord (micro_wakeword)
openWakeWord (openwakeword)

Scan Providers (Scanproviders) op('stt').par.Scanproviders Pulse

Default:: False

Custom Providers Folder (Providersfolder) op('stt').par.Providersfolder Folder

Default:: "" (Empty String)

Chunk Duration (s) (Chunkduration) op('stt').par.Chunkduration Float

Seconds of audio sent to the worker per chunk

Default:: 2.0
Range:: 0.1 to 10

Smart Chunking (Smartchunking) op('stt').par.Smartchunking Toggle

Default:: True

Max Chunk Duration (s) (Maxchunkduration) op('stt').par.Maxchunkduration Float

Default:: 8.0
Range:: 1 to 30

Pause Sensitivity (Pausesensitivity) op('stt').par.Pausesensitivity Float

Default:: 0.5
Range:: 0 to 1

Moonshine

Callbacks

Callbacks DAT (Callbackdat) op('stt').par.Callbackdat DAT

Default:: ./emptyCallbacks

Print Callbacks (Printcallbacks) op('stt').par.Printcallbacks Toggle

Default:: True

Changelog

v1.0.02026-05-02

added docs/compose.json
fixed auto-init race condition (Active toggle no longer resets during pending init) - deferred Initialize pulse to run() to avoid re-entrancy on Active toggle - updated worker subprocess entry points across 7 providers - bumped version to 1.0.0 - updated category to Pipelines
added Transcriptmode parameter (finals/live) - added provider menu sorting (transcription first, wake-word last) - added Active=off on shutdown for constant-mode pars - added committed text tracking for live transcript mode
set release_level to prod
add EfficientWord-Net wake-word provider and worker - add micro_wakeword provider and worker
unified stt operator on util-speech-template with provider registry - STTEXT with transcribe/stream/wake_detect modes and rolling transcript append - 6 providers: faster_whisper, moonshine, parakeet, openwakeword, porcupine, assemblyai - 5 subprocess workers targeting speech_sidecar venv - assemblyai websocket provider with v3 universal streaming, sub-chunk pacing, speaker labels, keyterms prompt - porcupine wake-word provider with 16 built-in keywords and custom .ppn support - wake-word output surface: wake_events table, WakeDetected deps, per-keyword debounce - per-provider signals surface on signals Script CHOP via SIGNAL_CHANNELS declaration - Callbacks page and STTCallbacks class with onPartial, onWakeDetected, onSpeechStart/End, onProviderChange hooks - real is_model_available cache detection for moonshine, parakeet, openwakeword - websocket branch scaffold ready for additional cloud providers - manifest extends util-speech-template and util-chained-callbacks, category speech
Initial stt structure
Initial commit