Skip to content

STT Soniox

v1.0.1New

The STT Soniox LOP provides real-time audio transcription and translation using Soniox’s cloud streaming API (the stt-rt-preview model). It connects via WebSocket for low-latency streaming, supports multiple languages with real-time translation between language pairs, speaker diarization, and per-session cost tracking.

  • Real-time streaming - Low-latency transcription via persistent WebSocket connection to Soniox
  • Translation modes - No translation, one-way, or two-way translation between 7 supported languages (Japanese, English, Spanish, French, German, Chinese, Korean)
  • Speaker diarization - Identify and label different speakers in multi-speaker audio
  • Endpoint detection - Automatic sentence boundary detection for cleaner transcript output
  • Cost tracking - Estimated usage costs tracked per session at Soniox’s rate of $0.12/hour
  • Idle timeout - Automatically disconnects after a configurable idle period to save API costs
  • Auto-connect - Toggling ‘Streaming Active’ will automatically connect if not already connected
  • Reconnect-safe - Reconnecting preserves existing transcript and translation data from the current session
  • Soniox API Key - Obtain from soniox.com. Enter on the STTSoniox page or pulse ‘Get API Key’ to open the signup page. Keys are stored in the KeyManager for reuse across sessions.
  • Dependencies - websockets and certifi libraries (install via ‘Install Dependencies’ on the STTSoniox page)
  • CHOP Input - Audio source. Connect a CHOP that outputs audio samples (e.g., an Audio Device In CHOP). The internal Script CHOP reads audio from this input and sends it to the Soniox streaming service.

The operator maintains several internal DATs with transcription data:

  • transcription_out - Running original-language transcript, combining finalized segments with in-progress tokens
  • translation_out - Running translated transcript (when a translation mode is enabled)
  • segments_out - Table with completed sentences including start/end timestamps, original text, translated text, confidence, speaker, and language
  • all_tokens - Current sentence token-level detail with timing, confidence, finality, and speaker
  • full_transcript - Persistent cumulative token table for the entire session (original language)
  • full_translation - Persistent cumulative token table for translations
  • session_info - Current session metadata (session ID, status, duration, provider, translation mode)
  • cost_history - Accumulated cost tracking across all sessions

The operator’s output connector can be configured via the ‘Out1’ menu to expose one of three views: transcription, translation, or segments.

  1. Enter your Soniox API key on the STTSoniox page (or store it in the ChatTD KeyManager under the key name soniox).
  2. Pulse ‘Install Dependencies’ if this is your first time using the operator.
  3. Set ‘Sample Rate’ to match your audio source (16kHz is the most common and recommended).
  4. Set ‘Number of Channels’ to match your audio source (1 for mono, 2 for stereo).
  5. Connect an audio CHOP to the operator’s input.
  6. Toggle ‘Connected’ to On to establish the WebSocket connection.
  7. Once ‘Connection Status’ shows connected, toggle ‘Streaming Active’ to On.
  8. Audio will be transcribed in real-time, with results appearing in the transcription output.
  1. Set ‘Translation Mode’ to ‘One-Way Translation’ or ‘Two-Way Translation’.
    • One-Way translates all speech into the target language (‘Language B’).
    • Two-Way translates between two languages — speech in Language A is translated to Language B and vice versa.
  2. Set ‘Language A (Source)’ to the spoken language (e.g., Japanese).
  3. Set ‘Language B (Target)’ to the desired output language (e.g., English).
  4. Connect and start streaming as described above.
  5. The original transcript appears in the transcription output and the translation appears in the translation output.
  6. Set the ‘Out1’ menu to ‘translation’ to route the translated text to the operator’s output connector.
  1. Enable ‘Enable Speaker Diarization’ on the STTSoniox page before connecting.
  2. Connect and start streaming.
  3. The segments output table will include a speaker column identifying which speaker produced each segment. This is useful for multi-speaker environments like meetings or interviews.
  • Enable ‘Include Non-Final Results’ for the lowest-latency display — words appear as they are recognized, with corrections applied as the model finalizes each sentence.
  • Enable ‘Enable Endpoint Detection’ for automatic sentence boundary detection, which produces cleaner segment breaks in the output.
  • Set an appropriate ‘Idle Timeout’ to automatically disconnect when no audio is detected, saving API costs during pauses.
  • Monitor the ‘Estimated Total Cost’ parameter to track cumulative API usage across all sessions.
  • Use ‘Clear Transcript’ to reset all transcript data without disconnecting. Use ‘Copy Original to Clipboard’ or ‘Copy Translation to Clipboard’ for quick export.
  • Translation and diarization settings are sent as part of the WebSocket configuration message on connect. Change these settings before toggling ‘Connected’, or disconnect and reconnect for changes to take effect.
  • Connection fails on macOS - The operator automatically applies an SSL certificate fix using certifi. If connections still fail, ensure certifi is installed via ‘Install Dependencies’.
  • No transcription output - Verify ‘Streaming Active’ is On and that your audio source is actively sending samples at the configured sample rate. Check that the CHOP input is connected and producing data.
  • High latency - Use 16kHz mono audio for optimal streaming performance. Higher sample rates and stereo increase bandwidth usage without improving transcription quality.
  • API key not loading - The operator checks the ChatTD KeyManager first, then a local config file. If neither contains a valid key, enter it directly in the ‘Soniox API Key’ field and it will be saved to both locations automatically.
  • Settings not taking effect - Translation mode, language, diarization, and endpoint detection are configured at connection time. Disconnect and reconnect after changing these parameters.
  • Audio not being sent - The internal Script CHOP only sends audio when ‘Streaming Active’ is On. Verify the CHOP input has samples (numSamples > 0) and the audio data is non-empty. Audio is sent in 100ms chunks at the configured sample rate.
Status (Status) op('stt_soniox').par.Status Str
Default:
"" (Empty String)
STT Provider (Provider) op('stt_soniox').par.Provider Str
Default:
"" (Empty String)
Soniox API Key (Apikey) op('stt_soniox').par.Apikey Str
Default:
"" (Empty String)
Get API Key (Getapikey) op('stt_soniox').par.Getapikey Pulse
Default:
False
Install Dependencies (Installdependencies) op('stt_soniox').par.Installdependencies Pulse
Default:
False
Connected (Connected) op('stt_soniox').par.Connected Toggle
Default:
False
Connection Status (Sttstatus) op('stt_soniox').par.Sttstatus Str
Default:
"" (Empty String)
Streaming Active (Active) op('stt_soniox').par.Active Toggle
Default:
False
Sample Rate (Samplerate) op('stt_soniox').par.Samplerate Menu
Default:
16000
Options:
16000, 44100, 48000
Number of Channels (Numchannels) op('stt_soniox').par.Numchannels Int
Default:
0
Range:
1 to 2
Slider Range:
1 to 2
Translation Mode (Translationmode) op('stt_soniox').par.Translationmode Menu
Default:
none
Options:
none, one_way, two_way
Language A (Source) (Languagea) op('stt_soniox').par.Languagea Menu
Default:
ja
Options:
ja, en, es, fr, de, zh, ko
Language B (Target) (Languageb) op('stt_soniox').par.Languageb Menu
Default:
en
Options:
en, ja, es, fr, de, zh, ko
Enable Speaker Diarization (Enablespeakerdiarization) op('stt_soniox').par.Enablespeakerdiarization Toggle
Default:
False
Enable Endpoint Detection (Enableendpointdetection) op('stt_soniox').par.Enableendpointdetection Toggle
Default:
False
Include Non-Final Results (Includenonfinal) op('stt_soniox').par.Includenonfinal Toggle
Default:
False
Idle Timeout (minutes) (Idletimeout) op('stt_soniox').par.Idletimeout Int
Default:
0
Range:
1 to 60
Slider Range:
1 to 60
Estimated Total Cost ($) (Estopcost) op('stt_soniox').par.Estopcost Float
Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Clear Transcript (Cleartranscript) op('stt_soniox').par.Cleartranscript Pulse
Default:
False
Copy Original to Clipboard (Copytranscript) op('stt_soniox').par.Copytranscript Pulse
Default:
False
Copy Translation to Clipboard (Copytranslation) op('stt_soniox').par.Copytranslation Pulse
Default:
False
Out1 (Out1) op('stt_soniox').par.Out1 Menu
Default:
transcription
Options:
transcription, translation, segments
v1.0.12026-03-26
  • Rename OriginalText to Text in segments_out schema - Add Language column to segments_out (before TranslatedText) - Reorder segments_out to standard schema: Start, End, Text, Confidence, IsFinal, Speaker, Language, TranslatedText - Add header enforcement to segments_out on init - Add LastTranscriptionResult assignment when segment is finalized - Add Speaker column to all_tokens and full_transcript tables
  • Initial commit
v1.0.02025-12-03

# stt_soniox v1.0.0

## Initial Release

Real-time speech-to-text transcription using Soniox streaming API.

Features

  • WebSocket-based streaming transcription
  • Translation support (one-way and two-way)
  • Multiple language support (Japanese, English, Spanish, French, German, Chinese, Korean)
  • Speaker diarization option
  • Endpoint detection
  • Cost tracking per session
  • Idle timeout auto-disconnect

Output Tables

  • segments_out - Finalized sentences with timestamps and confidence
  • transcription_out - Full running transcript text
  • translation_out - Translation output (when enabled)
  • full_transcript - All tokens for detailed analysis
  • all_tokens - Current sentence tokens being built

Dependencies

  • websockets
  • certifi (for macOS SSL)