STT Soniox
v1.0.1New
The STT Soniox LOP provides real-time audio transcription and translation using Soniox’s cloud streaming API (the stt-rt-preview model). It connects via WebSocket for low-latency streaming, supports multiple languages with real-time translation between language pairs, speaker diarization, and per-session cost tracking.
Key Features
Section titled “Key Features”- Real-time streaming - Low-latency transcription via persistent WebSocket connection to Soniox
- Translation modes - No translation, one-way, or two-way translation between 7 supported languages (Japanese, English, Spanish, French, German, Chinese, Korean)
- Speaker diarization - Identify and label different speakers in multi-speaker audio
- Endpoint detection - Automatic sentence boundary detection for cleaner transcript output
- Cost tracking - Estimated usage costs tracked per session at Soniox’s rate of $0.12/hour
- Idle timeout - Automatically disconnects after a configurable idle period to save API costs
- Auto-connect - Toggling ‘Streaming Active’ will automatically connect if not already connected
- Reconnect-safe - Reconnecting preserves existing transcript and translation data from the current session
Requirements
Section titled “Requirements”- Soniox API Key - Obtain from soniox.com. Enter on the STTSoniox page or pulse ‘Get API Key’ to open the signup page. Keys are stored in the KeyManager for reuse across sessions.
- Dependencies -
websocketsandcertifilibraries (install via ‘Install Dependencies’ on the STTSoniox page)
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”- CHOP Input - Audio source. Connect a CHOP that outputs audio samples (e.g., an Audio Device In CHOP). The internal Script CHOP reads audio from this input and sends it to the Soniox streaming service.
Outputs
Section titled “Outputs”The operator maintains several internal DATs with transcription data:
- transcription_out - Running original-language transcript, combining finalized segments with in-progress tokens
- translation_out - Running translated transcript (when a translation mode is enabled)
- segments_out - Table with completed sentences including start/end timestamps, original text, translated text, confidence, speaker, and language
- all_tokens - Current sentence token-level detail with timing, confidence, finality, and speaker
- full_transcript - Persistent cumulative token table for the entire session (original language)
- full_translation - Persistent cumulative token table for translations
- session_info - Current session metadata (session ID, status, duration, provider, translation mode)
- cost_history - Accumulated cost tracking across all sessions
The operator’s output connector can be configured via the ‘Out1’ menu to expose one of three views: transcription, translation, or segments.
Usage Examples
Section titled “Usage Examples”Basic Transcription
Section titled “Basic Transcription”- Enter your Soniox API key on the STTSoniox page (or store it in the ChatTD KeyManager under the key name
soniox). - Pulse ‘Install Dependencies’ if this is your first time using the operator.
- Set ‘Sample Rate’ to match your audio source (16kHz is the most common and recommended).
- Set ‘Number of Channels’ to match your audio source (1 for mono, 2 for stereo).
- Connect an audio CHOP to the operator’s input.
- Toggle ‘Connected’ to On to establish the WebSocket connection.
- Once ‘Connection Status’ shows connected, toggle ‘Streaming Active’ to On.
- Audio will be transcribed in real-time, with results appearing in the transcription output.
Real-Time Translation
Section titled “Real-Time Translation”- Set ‘Translation Mode’ to ‘One-Way Translation’ or ‘Two-Way Translation’.
- One-Way translates all speech into the target language (‘Language B’).
- Two-Way translates between two languages — speech in Language A is translated to Language B and vice versa.
- Set ‘Language A (Source)’ to the spoken language (e.g., Japanese).
- Set ‘Language B (Target)’ to the desired output language (e.g., English).
- Connect and start streaming as described above.
- The original transcript appears in the transcription output and the translation appears in the translation output.
- Set the ‘Out1’ menu to ‘translation’ to route the translated text to the operator’s output connector.
Using Speaker Diarization
Section titled “Using Speaker Diarization”- Enable ‘Enable Speaker Diarization’ on the STTSoniox page before connecting.
- Connect and start streaming.
- The segments output table will include a speaker column identifying which speaker produced each segment. This is useful for multi-speaker environments like meetings or interviews.
Best Practices
Section titled “Best Practices”- Enable ‘Include Non-Final Results’ for the lowest-latency display — words appear as they are recognized, with corrections applied as the model finalizes each sentence.
- Enable ‘Enable Endpoint Detection’ for automatic sentence boundary detection, which produces cleaner segment breaks in the output.
- Set an appropriate ‘Idle Timeout’ to automatically disconnect when no audio is detected, saving API costs during pauses.
- Monitor the ‘Estimated Total Cost’ parameter to track cumulative API usage across all sessions.
- Use ‘Clear Transcript’ to reset all transcript data without disconnecting. Use ‘Copy Original to Clipboard’ or ‘Copy Translation to Clipboard’ for quick export.
- Translation and diarization settings are sent as part of the WebSocket configuration message on connect. Change these settings before toggling ‘Connected’, or disconnect and reconnect for changes to take effect.
Troubleshooting
Section titled “Troubleshooting”- Connection fails on macOS - The operator automatically applies an SSL certificate fix using
certifi. If connections still fail, ensurecertifiis installed via ‘Install Dependencies’. - No transcription output - Verify ‘Streaming Active’ is On and that your audio source is actively sending samples at the configured sample rate. Check that the CHOP input is connected and producing data.
- High latency - Use 16kHz mono audio for optimal streaming performance. Higher sample rates and stereo increase bandwidth usage without improving transcription quality.
- API key not loading - The operator checks the ChatTD KeyManager first, then a local config file. If neither contains a valid key, enter it directly in the ‘Soniox API Key’ field and it will be saved to both locations automatically.
- Settings not taking effect - Translation mode, language, diarization, and endpoint detection are configured at connection time. Disconnect and reconnect after changing these parameters.
- Audio not being sent - The internal Script CHOP only sends audio when ‘Streaming Active’ is On. Verify the CHOP input has samples (
numSamples > 0) and the audio data is non-empty. Audio is sent in 100ms chunks at the configured sample rate.
Parameters
Section titled “Parameters”STTSoniox
Section titled “STTSoniox” Status (Status)
op('stt_soniox').par.Status Str - Default:
"" (Empty String)
STT Provider (Provider)
op('stt_soniox').par.Provider Str - Default:
"" (Empty String)
Soniox API Key (Apikey)
op('stt_soniox').par.Apikey Str - Default:
"" (Empty String)
Get API Key (Getapikey)
op('stt_soniox').par.Getapikey Pulse - Default:
False
Install Dependencies (Installdependencies)
op('stt_soniox').par.Installdependencies Pulse - Default:
False
Connected (Connected)
op('stt_soniox').par.Connected Toggle - Default:
False
Connection Status (Sttstatus)
op('stt_soniox').par.Sttstatus Str - Default:
"" (Empty String)
Streaming Active (Active)
op('stt_soniox').par.Active Toggle - Default:
False
Number of Channels (Numchannels)
op('stt_soniox').par.Numchannels Int - Default:
0- Range:
- 1 to 2
- Slider Range:
- 1 to 2
Enable Speaker Diarization (Enablespeakerdiarization)
op('stt_soniox').par.Enablespeakerdiarization Toggle - Default:
False
Enable Endpoint Detection (Enableendpointdetection)
op('stt_soniox').par.Enableendpointdetection Toggle - Default:
False
Include Non-Final Results (Includenonfinal)
op('stt_soniox').par.Includenonfinal Toggle - Default:
False
Idle Timeout (minutes) (Idletimeout)
op('stt_soniox').par.Idletimeout Int - Default:
0- Range:
- 1 to 60
- Slider Range:
- 1 to 60
Estimated Total Cost ($) (Estopcost)
op('stt_soniox').par.Estopcost Float - Default:
0.0- Range:
- 0 to 1
- Slider Range:
- 0 to 1
Clear Transcript (Cleartranscript)
op('stt_soniox').par.Cleartranscript Pulse - Default:
False
Copy Original to Clipboard (Copytranscript)
op('stt_soniox').par.Copytranscript Pulse - Default:
False
Copy Translation to Clipboard (Copytranslation)
op('stt_soniox').par.Copytranslation Pulse - Default:
False
Changelog
Section titled “Changelog”v1.0.12026-03-26
- Rename OriginalText to Text in segments_out schema - Add Language column to segments_out (before TranslatedText) - Reorder segments_out to standard schema: Start, End, Text, Confidence, IsFinal, Speaker, Language, TranslatedText - Add header enforcement to segments_out on init - Add LastTranscriptionResult assignment when segment is finalized - Add Speaker column to all_tokens and full_transcript tables
- Initial commit
v1.0.02025-12-03
# stt_soniox v1.0.0
## Initial Release
Real-time speech-to-text transcription using Soniox streaming API.
Features
- WebSocket-based streaming transcription
- Translation support (one-way and two-way)
- Multiple language support (Japanese, English, Spanish, French, German, Chinese, Korean)
- Speaker diarization option
- Endpoint detection
- Cost tracking per session
- Idle timeout auto-disconnect
Output Tables
segments_out- Finalized sentences with timestamps and confidencetranscription_out- Full running transcript texttranslation_out- Translation output (when enabled)full_transcript- All tokens for detailed analysisall_tokens- Current sentence tokens being built
Dependencies
- websockets
- certifi (for macOS SSL)