STT Soniox

v1.0.1New

The STT Soniox LOP provides real-time audio transcription and translation using Soniox’s cloud streaming API (the stt-rt-preview model). It connects via WebSocket for low-latency streaming, supports multiple languages with real-time translation between language pairs, speaker diarization, and per-session cost tracking.

Key Features

Real-time streaming - Low-latency transcription via persistent WebSocket connection to Soniox
Translation modes - No translation, one-way, or two-way translation between 7 supported languages (Japanese, English, Spanish, French, German, Chinese, Korean)
Speaker diarization - Identify and label different speakers in multi-speaker audio
Endpoint detection - Automatic sentence boundary detection for cleaner transcript output
Cost tracking - Estimated usage costs tracked per session at Soniox’s rate of $0.12/hour
Idle timeout - Automatically disconnects after a configurable idle period to save API costs
Auto-connect - Toggling ‘Streaming Active’ will automatically connect if not already connected
Reconnect-safe - Reconnecting preserves existing transcript and translation data from the current session

Requirements

Soniox API Key - Obtain from soniox.com. Enter on the STTSoniox page or pulse ‘Get API Key’ to open the signup page. Keys are stored in the KeyManager for reuse across sessions.
Dependencies - websockets and certifi libraries (install via ‘Install Dependencies’ on the STTSoniox page)

Input/Output

Inputs

CHOP Input - Audio source. Connect a CHOP that outputs audio samples (e.g., an Audio Device In CHOP). The internal Script CHOP reads audio from this input and sends it to the Soniox streaming service.

Outputs

The operator maintains several internal DATs with transcription data:

transcription_out - Running original-language transcript, combining finalized segments with in-progress tokens
translation_out - Running translated transcript (when a translation mode is enabled)
segments_out - Table with completed sentences including start/end timestamps, original text, translated text, confidence, speaker, and language
all_tokens - Current sentence token-level detail with timing, confidence, finality, and speaker
full_transcript - Persistent cumulative token table for the entire session (original language)
full_translation - Persistent cumulative token table for translations
session_info - Current session metadata (session ID, status, duration, provider, translation mode)
cost_history - Accumulated cost tracking across all sessions

The operator’s output connector can be configured via the ‘Out1’ menu to expose one of three views: transcription, translation, or segments.

Usage Examples

Basic Transcription

Enter your Soniox API key on the STTSoniox page (or store it in the ChatTD KeyManager under the key name soniox).
Pulse ‘Install Dependencies’ if this is your first time using the operator.
Set ‘Sample Rate’ to match your audio source (16kHz is the most common and recommended).
Set ‘Number of Channels’ to match your audio source (1 for mono, 2 for stereo).
Connect an audio CHOP to the operator’s input.
Toggle ‘Connected’ to On to establish the WebSocket connection.
Once ‘Connection Status’ shows connected, toggle ‘Streaming Active’ to On.
Audio will be transcribed in real-time, with results appearing in the transcription output.

Real-Time Translation

Set ‘Translation Mode’ to ‘One-Way Translation’ or ‘Two-Way Translation’.
- One-Way translates all speech into the target language (‘Language B’).
- Two-Way translates between two languages — speech in Language A is translated to Language B and vice versa.
Set ‘Language A (Source)’ to the spoken language (e.g., Japanese).
Set ‘Language B (Target)’ to the desired output language (e.g., English).
Connect and start streaming as described above.
The original transcript appears in the transcription output and the translation appears in the translation output.
Set the ‘Out1’ menu to ‘translation’ to route the translated text to the operator’s output connector.

Using Speaker Diarization

Enable ‘Enable Speaker Diarization’ on the STTSoniox page before connecting.
Connect and start streaming.
The segments output table will include a speaker column identifying which speaker produced each segment. This is useful for multi-speaker environments like meetings or interviews.

Best Practices

Enable ‘Include Non-Final Results’ for the lowest-latency display — words appear as they are recognized, with corrections applied as the model finalizes each sentence.
Enable ‘Enable Endpoint Detection’ for automatic sentence boundary detection, which produces cleaner segment breaks in the output.
Set an appropriate ‘Idle Timeout’ to automatically disconnect when no audio is detected, saving API costs during pauses.
Monitor the ‘Estimated Total Cost’ parameter to track cumulative API usage across all sessions.
Use ‘Clear Transcript’ to reset all transcript data without disconnecting. Use ‘Copy Original to Clipboard’ or ‘Copy Translation to Clipboard’ for quick export.
Translation and diarization settings are sent as part of the WebSocket configuration message on connect. Change these settings before toggling ‘Connected’, or disconnect and reconnect for changes to take effect.

Troubleshooting

Connection fails on macOS - The operator automatically applies an SSL certificate fix using certifi. If connections still fail, ensure certifi is installed via ‘Install Dependencies’.
No transcription output - Verify ‘Streaming Active’ is On and that your audio source is actively sending samples at the configured sample rate. Check that the CHOP input is connected and producing data.
High latency - Use 16kHz mono audio for optimal streaming performance. Higher sample rates and stereo increase bandwidth usage without improving transcription quality.
API key not loading - The operator checks the ChatTD KeyManager first, then a local config file. If neither contains a valid key, enter it directly in the ‘Soniox API Key’ field and it will be saved to both locations automatically.
Settings not taking effect - Translation mode, language, diarization, and endpoint detection are configured at connection time. Disconnect and reconnect after changing these parameters.
Audio not being sent - The internal Script CHOP only sends audio when ‘Streaming Active’ is On. Verify the CHOP input has samples (numSamples > 0) and the audio data is non-empty. Audio is sent in 100ms chunks at the configured sample rate.

Parameters

STTSoniox

Status (Status) op('stt_soniox').par.Status Str

Default:: "" (Empty String)

STT Provider (Provider) op('stt_soniox').par.Provider Str

Default:: "" (Empty String)

Soniox API Key (Apikey) op('stt_soniox').par.Apikey Str

Default:: "" (Empty String)

Get API Key (Getapikey) op('stt_soniox').par.Getapikey Pulse

Default:: False

Install Dependencies (Installdependencies) op('stt_soniox').par.Installdependencies Pulse

Default:: False

Connected (Connected) op('stt_soniox').par.Connected Toggle

Default:: False

Connection Status (Sttstatus) op('stt_soniox').par.Sttstatus Str

Default:: "" (Empty String)

Streaming Active (Active) op('stt_soniox').par.Active Toggle

Default:: False

Number of Channels (Numchannels) op('stt_soniox').par.Numchannels Int

Default:: 0
Range:: 1 to 2
Slider Range:: 1 to 2

Enable Speaker Diarization (Enablespeakerdiarization) op('stt_soniox').par.Enablespeakerdiarization Toggle

Default:: False

Enable Endpoint Detection (Enableendpointdetection) op('stt_soniox').par.Enableendpointdetection Toggle

Default:: False

Include Non-Final Results (Includenonfinal) op('stt_soniox').par.Includenonfinal Toggle

Default:: False

Idle Timeout (minutes) (Idletimeout) op('stt_soniox').par.Idletimeout Int

Default:: 0
Range:: 1 to 60
Slider Range:: 1 to 60

Estimated Total Cost ($) (Estopcost) op('stt_soniox').par.Estopcost Float

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

Clear Transcript (Cleartranscript) op('stt_soniox').par.Cleartranscript Pulse

Default:: False

Copy Original to Clipboard (Copytranscript) op('stt_soniox').par.Copytranscript Pulse

Default:: False

Copy Translation to Clipboard (Copytranslation) op('stt_soniox').par.Copytranslation Pulse

Default:: False

Changelog

v1.0.12026-03-26

Rename OriginalText to Text in segments_out schema - Add Language column to segments_out (before TranslatedText) - Reorder segments_out to standard schema: Start, End, Text, Confidence, IsFinal, Speaker, Language, TranslatedText - Add header enforcement to segments_out on init - Add LastTranscriptionResult assignment when segment is finalized - Add Speaker column to all_tokens and full_transcript tables
Initial commit

v1.0.02025-12-03

# stt_soniox v1.0.0

## Initial Release

Real-time speech-to-text transcription using Soniox streaming API.

Features

WebSocket-based streaming transcription
Translation support (one-way and two-way)
Multiple language support (Japanese, English, Spanish, French, German, Chinese, Korean)
Speaker diarization option
Endpoint detection
Cost tracking per session
Idle timeout auto-disconnect

Output Tables

segments_out - Finalized sentences with timestamps and confidence
transcription_out - Full running transcript text
translation_out - Translation output (when enabled)
full_transcript - All tokens for detailed analysis
all_tokens - Current sentence tokens being built

Dependencies

websockets
certifi (for macOS SSL)