- OPERATORS
- PIPELINES
STT
v1.0.0newstt turns incoming audio into transcripts, streaming partials, or wake-word events through swappable speech providers. Use it for push-to-talk transcription, live captions, or wake detection in voice-driven TouchDesigner systems.
What It Does
Section titled “What It Does”The operator discovers built-in and custom STT providers, loads the selected Provider, creates provider-specific parameters, and routes audio to the selected backend. Local providers run through the speech worker template; websocket and API providers use their own connection paths. Results are normalized into final transcriptions, partial text, wake events, and callback fires.
Mode is filtered by provider. Transcription providers expose Push to Talk and/or Streaming, while wake-word providers expose Wake Word when supported.
Typical Workflow
Section titled “Typical Workflow”- Wire a mono Audio CHOP into input 1.
- Choose Provider, then pulse Install Dependencies or Download Model if the selected provider needs local setup.
- Pulse Initialize Engine and wait for Engine Status to report readiness.
- Choose Mode. Use Push to Talk for buffered utterances, Streaming for partial captions, or Wake Word when available.
- Turn Active on while audio should be captured. In Push to Talk, turning Active off flushes the buffered utterance for transcription.
- Read the transcript output DAT,
partials_out, orwake_events, and use Clear Transcript when starting a new session.
Inputs And Outputs
Section titled “Inputs And Outputs”- Input 1: Audio CHOP, typically from Audio Device In or
voice_activity. - Output 1: Transcription text DAT with final transcript text, or live text when Transcript Output includes partials.
- Output 2: Audio passthrough CHOP.
- Output 3: Partials and wake-event output surface for streaming or wake detection workflows.
Works Well With
Section titled “Works Well With”voice_activity: Gates microphone audio before transcription.tts: Completes speech-in / speech-out voice loops.voice_realtime: Pairs STT events with realtime conversation flows.flow_router: Routes transcript, partial, or wake events to downstream actions.
Gotchas
Section titled “Gotchas”- Provider changes can replace provider-specific pages and reset Mode if the previous mode is unsupported.
- Local providers need dependencies and model downloads before first use. Cloud providers need API keys through the provider page or ChatTD key handling.
- Active can auto-initialize the engine if it is not ready, but failed initialization leaves no useful transcription; check Engine Status and logs.
- In Push to Talk mode, final transcription is emitted when Active turns off.
- Callback DAT hooks include transcription completion, partials, wake detection, speech start/end, provider/mode changes, and errors.
Parameters
Section titled “Parameters”op('stt').par.Installdependencies Pulse - Default:
False
op('stt').par.Initialize Pulse - Default:
False
op('stt').par.Shutdown Pulse - Default:
False
op('stt').par.Enginestatus Str - Default:
"" (Empty String)
op('stt').par.Active Toggle - Default:
False
op('stt').par.Cleartranscript Pulse - Default:
False
op('stt').par.Copytranscript Pulse - Default:
False
op('stt').par.Initializeonstart Toggle - Default:
False
op('stt').par.Downloadmodel Pulse - Default:
False
op('stt').par.Monitorworkerlogs Toggle - Default:
True
op('stt').par.Autoreattachoninit Toggle - Default:
True
op('stt').par.Forceattachoninit Toggle - Default:
False
op('stt').par.Speechvenv Folder - Default:
"" (Empty String)
op('stt').par.Scanproviders Pulse - Default:
False
op('stt').par.Providersfolder Folder - Default:
"" (Empty String)
op('stt').par.Chunkduration Float Seconds of audio sent to the worker per chunk
- Default:
2.0- Range:
- 0.1 to 10
op('stt').par.Smartchunking Toggle - Default:
True
op('stt').par.Maxchunkduration Float - Default:
8.0- Range:
- 1 to 30
op('stt').par.Pausesensitivity Float - Default:
0.5- Range:
- 0 to 1
Moonshine
Section titled “Moonshine”Callbacks
Section titled “Callbacks”op('stt').par.Callbackdat DAT - Default:
./emptyCallbacks
op('stt').par.Printcallbacks Toggle - Default:
True
Changelog
Section titled “Changelog”v1.0.02026-05-02
- added docs/compose.json
- fixed auto-init race condition (Active toggle no longer resets during pending init) - deferred Initialize pulse to run() to avoid re-entrancy on Active toggle - updated worker subprocess entry points across 7 providers - bumped version to 1.0.0 - updated category to Pipelines
- added Transcriptmode parameter (finals/live) - added provider menu sorting (transcription first, wake-word last) - added Active=off on shutdown for constant-mode pars - added committed text tracking for live transcript mode
- set release_level to prod
- add EfficientWord-Net wake-word provider and worker - add micro_wakeword provider and worker
- unified stt operator on util-speech-template with provider registry - STTEXT with transcribe/stream/wake_detect modes and rolling transcript append - 6 providers: faster_whisper, moonshine, parakeet, openwakeword, porcupine, assemblyai - 5 subprocess workers targeting speech_sidecar venv - assemblyai websocket provider with v3 universal streaming, sub-chunk pacing, speaker labels, keyterms prompt - porcupine wake-word provider with 16 built-in keywords and custom .ppn support - wake-word output surface: wake_events table, WakeDetected deps, per-keyword debounce - per-provider signals surface on signals Script CHOP via SIGNAL_CHANNELS declaration - Callbacks page and STTCallbacks class with onPartial, onWakeDetected, onSpeechStart/End, onProviderChange hooks - real is_model_available cache detection for moonshine, parakeet, openwakeword - websocket branch scaffold ready for additional cloud providers - manifest extends util-speech-template and util-chained-callbacks, category speech
- Initial stt structure
- Initial commit