OPERATORS
CONTROLLERS

Voice Agent

v1.0.0new

The Voice Realtime LOP is a single operator that talks to any realtime voice-to-voice provider. Provider modules (provider_gemini_live.py, provider_openai_realtime.py, provider_xai_grok.py, provider_hume_evi.py) drop into the operator like TTS/STT providers — swap backends from the Provider menu, no per-provider operator required. It replaces the older gemini_live monolith (now deprecated) and mirrors the unified-operator pattern used by tts and stt_*.

Key Features

One operator, four cloud providers — switch backends from a single menu
Session modes: continuous, one_turn, push_to_talk
Session resumption: native handle (Gemini, Hume) or transcript replay (OpenAI, xAI)
Disk-persisted session history with optional auto-resume
External tool orchestration via the same Tool sequence used by the Agent LOP
Built-in tools: end_conversation and output_text_content
Live per-minute cost ballpark + running session cost + optional Costbudget cap
Live-streaming user transcript (row rewrites in place as the user speaks)
Profiles + Skills injection into the system prompt
Affect / emotion signals (Hume EVI — 48 prosody dimensions on a dedicated signals CHOP)

Providers at a glance

Provider	Model families	Audio out	Tools	Notable
Gemini Live	`gemini-3.1-flash-live-preview`, `gemini-2.5-flash-native-audio-preview-12-2025`	24 kHz	Sync only on 3.x, async on 2.5	Native session resumption, video-in, Google Search grounding
OpenAI Realtime	`gpt-realtime`, `gpt-realtime-mini`	24 kHz	Streamed (one item per call)	Token-metered idle, long session cap
xAI Grok Voice	`grok-voice-*`	24 kHz	Streamed	Flat per-minute pricing (wallclock-metered)
Hume EVI	EVI 3	48 kHz	Streamed	Prosody/affect side-channel, voice cloning supported

All providers use the same interface — code written for one works for all four.

Requirements

API keys flow through ChatTD’s Key Manager. Store a key per provider under its server name (gemini, openai, xai, hume) or paste it into the Apikey parameter on the provider sub-page.
Python dependencies — declared per provider in its DEPENDENCIES constant. The first time you switch to a provider whose deps are missing, the backend page surfaces an install pulse. Gemini needs google-genai, OpenAI needs openai, xAI and Hume use raw websockets (already pinned).

Account & billing

Realtime voice can get expensive fast — 10 minutes of continuous voice-to-voice on Gemini 3.1 runs ~$2.50 at current paid-tier rates. Check your tier and set Costbudget before a long session.

Provider	Tier URL	Pricing reference
Gemini Live	<https://aistudio.google.com/usage>	<https://ai.google.dev/gemini-api/docs/pricing>
OpenAI Realtime	<https://platform.openai.com/usage>	<https://openai.com/api/pricing/>
xAI Grok	<https://console.x.ai/usage>	<https://x.ai/api#pricing>
Hume EVI	<https://beta.hume.ai/settings/billing>	<https://www.hume.ai/pricing>

The Pricing parameter on the Voice page shows the current provider + model ballpark as soon as you select them (e.g. ~in 0.005 USD/min out 0.018 USD/min). Sessioncost accumulates live as the session runs. Pulse Resetcostmeter to zero it.

Models & Pricing

Gemini Live

Model ID	Status	Audio in/out	Video in	Notes
`gemini-3.1-flash-live-preview`	Preview (newest, default)	~$0.005 / $0.018 per min	~$0.002/min	Acoustic nuance, thinkingLevel, sync function calling
`gemini-2.5-flash-native-audio-preview-12-2025`	Preview	~$0.005 / $0.018 per min	~$0.005/min	Native audio, async tools

Voice-chat ballpark on 3.1: ~$0.30/min continuous voice-to-voice.

OpenAI Realtime

Model ID	Status	Audio in/out	Notes
`gpt-realtime`	GA	Token-metered	Long session cap, native tool interruption
`gpt-realtime-mini`	GA	Token-metered, cheaper	Lower quality, same interface

xAI Grok Voice

Flat ~$0.05/min wallclock (idle time bills). Use one_turn mode to avoid paying for dead air.

Hume EVI

~$0.04–0.07/min wallclock depending on voice. 30-minute session cap. Ships an onAffect callback with per-turn prosody scores (Joy, Surprise, Admiration, etc.).

The Model parameter is an editable menu — type a custom ID if a provider ships a new model before this operator is updated.

Input/Output

Inputs

Input 1 (Audio CHOP): Microphone audio. Typically a mono CHOP from an Audio Device In, fed to the operator via Micin on the Voice page. The EXT resamples to each provider’s required rate automatically (SAMPLE_RATE_IN constant per provider).

Outputs

Output 1: Conversation table DAT (role, message, id, timestamp, type, metadata, session_id)
Output 2: Current audio playback CHOP (store_output)
Output 3: Full session audio CHOP (full_audio)
Output 4: Text output DAT (content from the output_text_content tool, when enabled)
signals CHOP: Common channels (connected, model_ready, worker_active, cost_in_seconds, cost_out_seconds) plus any provider-specific channels declared via that provider’s SIGNAL_CHANNELS dict (Hume: 48 affect dimensions prefixed hume_evi_affect_*).

Session modes (Voice page)

continuous (default): Connect opens the session and keeps it alive full-duplex until Disconnect or the end_conversation tool fires.
one_turn: Connect opens the session for one exchange. After the assistant’s first turn-final text the EXT either holds the socket and disarms the mic (token-metered providers — Gemini, OpenAI) or disconnects and writes the trace (wallclock-metered — xAI, Hume). The next Connect re-arms. Use for discrete voice prompts when you don’t want to pay for idle time.
push_to_talk: Session stays open, but the Talk toggle gates whether mic audio flows. Bind Talk to a Keyboard In or MIDI In CHOP for walkie-talkie-style interactions.

The Sessionstate readout shows where the session is: disconnected / connecting / active / armed / ending.

Session resumption

On disconnect the EXT writes a sibling JSON trace (voice_<timestamp>_<hash>.json) to Sessiontracedir (defaults to project.folder/voice_sessions/). The trace holds the resume handle (provider-specific), the transcript, and the end reason.

On the next connect the EXT picks a resume source in this order:

Loadsessionfile (file path) — explicit one-shot
Resumelast (toggle) — newest trace in the directory
None — starts fresh

Resumption strategy is per-provider:

Gemini Live / Hume EVI → native handle. Zero replay cost.
OpenAI Realtime / xAI Grok → transcript replay (last Maxreplayrows messages, default 20, user+assistant only). Replay cost grows with history — the Sessionresume readout says so when replay fires.

Tool-call / tool-result rows are dropped from replay to avoid lying to the model about output it didn’t produce.

Tool Integration

Voice Realtime consumes tools from other LOPs using the same pattern as the Agent LOP. It does not expose a GetTool() method.

Connecting external tools

On the Tools page, enable Use LOP Tools.
In the External Op Tools sequence, add a block and drag the tool operator into the OP field.
Set Mode per tool:
- enabled — blocks until the tool completes before the model continues.
- enabled_nonblocking — fires and forgets. Safe on Gemini 2.5 and OpenAI; on Gemini 3.x the model runs sync regardless.
- disabled — skipped.
Connect the session. The model calls tools as needed and folds the results into its response.

Built-in tools

end_conversation — on when Allow model to end conversation is enabled. The model can close gracefully on goodbyes.
output_text_content — on when Output text is enabled. The model can display text (code, data, URLs) in the fourth output DAT without reading it aloud.
Google Search grounding (Gemini only) — add google_search as a tool in Enablegrounding on the Gemini Live page.

Tool-call rendering in chat_viewer is automatic: paired tool_call / tool_result rows collapse into a single expandable entry by metadata.call_id.

Streaming modes (Voice page)

Streamingmode controls how assistant/user transcripts land in the conversation DAT:

live (default): one row per turn, rewritten in place as deltas arrive. Best UX for live captions. chat_viewer re-renders in place via stable row ids.
coalesce: one row per turn, written only on turn-final. Cleanest log; no streaming jitter.
append: one row per delta. Debug-heavy. Avoid for long sessions.

xAI Grok emits no user-delta stream — on xAI, live degrades to coalesce automatically for user text.

Cost control

Pricing — per-minute ballpark for the active provider + model, refreshed on change.
Sessioncost — running session spend (accumulated via SAMPLE_RATE + audio seconds × provider pricing).
Costbudget — hard cap in USD. When Sessioncost exceeds Costbudget the EXT disconnects and fires onError with source='budget'. Set to 0 to disable.
Resetcostmeter — pulse to zero the session cost meter (does not reset the budget).

Profiles & Skills

Profiles page — scan a folder of JSON profile files, pick one from the menu, the system prompt + model + voice + tool toggles apply on connect.
Skills page — scan a folder of JSON skills, each skill’s system-prompt chunk is appended to the session instructions.

Both pages mirror the agent LOP’s layout and share the same profile/skill file format.

Callbacks

Wire custom logic on the Callbacks page. The Callbackdat textDAT receives a stub with every callback signature: onSessionStart, onSessionEnd, onAssistantText, onUserText, onToolCall, onToolResult, onAudioIn, onAudioOut, onProviderChange, onError, onAffect (Hume only), onUserSpeechStarted / onUserSpeechEnded where the provider supplies them.

Toggle Printcallbacks to log every callback fire to the textport while developing.

Usage Examples

Basic voice conversation

Select Provider on the Voice page. Pulse Scanproviders if the menu is empty.
On the provider sub-page, pick a Model and Voice. Check the Pricing readout on the Voice page.
Paste an API key into Apikey (or store it under the provider’s server name in ChatTD Key Manager).
Pulse Connect. Watch Sessionstate flip to active.
Speak into the mic. Assistant audio plays through the Playback-page device.
Pulse Disconnect when done — the session trace is written.

Resuming the last session

Enable Resumelast on the Playback page before connecting.
Pulse Connect. The Sessionresume readout shows which path fired (Resumed via native handle (gemini_live) or Replayed 20 messages (replay, cost grows with history)).
The conversation DAT pre-populates with the previous transcript; the provider is handed either the resume token or the replayed messages.

Budget-capped demo

Set Costbudget = 0.50 on the Tools page.
Connect and converse.
The EXT disconnects the moment spend crosses $0.50 and writes an error row to the conversation DAT.

Session refresh

The EXT auto-refreshes sessions as they approach the provider’s MAX_SESSION_S cap (Gemini: 900s, OpenAI: 3600s, xAI: 3600s, Hume: 1800s), or immediately on provider-emitted goaway. Controls on the Voice page:

Auto-Refresh Session (default on) — arm the deadline coordinator.
Refresh Warning (s) (default 30) — seconds before cap at which onStatus('expiring') fires. Set 0 to disable the warning and refresh only at cap.

Per-provider behavior routes off RESUMPTION:

Native (Gemini Live, Hume EVI) — resume handle captured via get_persistable_state is re-injected into the new start_session. The server carries the full history; no client-side replay.
Replay (OpenAI Realtime, xAI Grok) — new session primed via prime_history with the shaped transcript, capped by Maxreplayrows. Token cost grows with transcript length.
None — session ends cleanly; onSessionEnd fires with end_reason='cap' and onStatus('expired') follows. No reconnect.

During a refresh the Session State badge reads refreshing, the mic pump is paused, and a system row ▶ Refreshed session (<mode>) — reason: <cap|goaway> is appended to the conversation DAT. Conversation Cost accumulates across refreshes; Session Cost resets per leg. Costbudget is enforced against the conversation-wide total.

In-flight user audio at the moment of refresh is dropped (v1) — the audio buffer isn’t carried across the reconnect. Speaker-out finishes its current buffer since playback is decoupled from the session.

Known limitations

Voice cloning UI is not implemented in v1 even though Hume declares SUPPORTS_VOICE_CLONING=True.
xAI Grok user-text deltas are not emitted by the provider — only final user transcripts land in the DAT.

Troubleshooting

Sessioncost stuck at $0.00000: the active provider’s pricing(model_id) returned nothing for the selected model. Verify the Model ID is in the provider’s pricing map.
Mic audio not flowing: check Sessionstate. If it’s armed, the gate is closed — you’re in push_to_talk without Talk on, or in one_turn after the first reply. Pulse Connect to re-arm.
“No key for server ‘gemini’”: open ChatTD Key Manager and add a key under the server name, or paste into Apikey on the provider sub-page.
1007 / 1008 close codes on Gemini: usually a dtype or rate mismatch on mic input. The provider asserts int16 little-endian and rate = SAMPLE_RATE_IN — upstream resampling should handle it, but check the mic CHOP is mono.
Replay costs a lot: lower Maxreplayrows or switch to a native-resumption provider (Gemini, Hume). Replay cost scales with transcript length.

Parameters

Voice

Session Mode (Sessionmode) op('voice_agent').par.Sessionmode Menu

continuous: mic is hot full-duplex until Disconnect or end_conversation. one_turn: Connect arms for exactly one exchange — after the assistant finishes responding, mic auto-disarms (token-metered providers hold the socket open; wallclock-metered providers disconnect to avoid idle cost). Hit Connect again to arm the next turn. push_to_talk: socket stays open; mic only sends audio while the Talk toggle is on — bind it to keyboard/MIDI.

Default:: continuous
Options:: continuous, one_turn, push_to_talk

Connect (Connect) op('voice_agent').par.Connect Pulse

Default:: False

Disconnect (Disconnect) op('voice_agent').par.Disconnect Pulse

Default:: False

Talk (PTT) (Talk) op('voice_agent').par.Talk Toggle

Push-to-talk gate. Only used when Session Mode = push_to_talk. True → mic audio streams to provider; False → mic is muted (session stays open).

Default:: False

Session State (Sessionstate) op('voice_agent').par.Sessionstate Str

disconnected | connecting | active | armed | refreshing | ending. "armed" = session open, mic gated off (one_turn waiting for next arm, PTT with Talk=off, or Active=off). "refreshing" = session refresh coordinator is tearing down and restarting around MAX_SESSION_S / GoAway.

Default:: "" (Empty String)

Auto-Refresh Session (Autosessionrefresh) op('voice_agent').par.Autosessionrefresh Toggle

Auto-refresh the session when the provider's MAX_SESSION_S cap is approached or on provider GoAway. Native-resumption providers (Gemini, Hume) carry an opaque handle across the refresh. Replay providers (OpenAI, xAI) rehydrate the transcript via prime_history. Turn off for clean expiry — the session ends, onSessionEnd fires with end_reason=cap.

Default:: True

Refresh Warning (s) (Refreshwarning) op('voice_agent').par.Refreshwarning Int

Seconds before the session cap at which to emit onStatus(expiring). Set 0 to disable the warning and refresh only at cap.

Default:: 30
Range:: 0 to 300
Slider Range:: 0 to 300

Provider (Provider) op('voice_agent').par.Provider StrMenu

Default:

gemini_live

Menu Options:

Hume EVI (hume_evi)
xAI Grok Voice (xai_grok)
Gemini Live (gemini_live)
OpenAI Realtime (openai_realtime)

Scan Providers (Scanproviders) op('voice_agent').par.Scanproviders Pulse

Default:: False

Custom Providers Folder (Providersfolder) op('voice_agent').par.Providersfolder Folder

Default:: "" (Empty String)

Mic In (CHOP) (Micin) op('voice_agent').par.Micin CHOP

Default:: "" (Empty String)

Text Input (Inputtext) op('voice_agent').par.Inputtext Str

Default:: "" (Empty String)

Send Text (Sendtext) op('voice_agent').par.Sendtext Pulse

Default:: False

Pricing (Pricing) op('voice_agent').par.Pricing Str

Default:: "" (Empty String)

Conversation Cost (Conversationcost) op('voice_agent').par.Conversationcost Str

Running USD total across all session refreshes in the current conversation. Resets on Connect (fresh session) or Reset Cost Meter, but preserved across auto-refresh events.

Default:: "" (Empty String)

Session Cost (Sessioncost) op('voice_agent').par.Sessioncost Str

Default:: "" (Empty String)

Reset Cost Meter (Resetcostmeter) op('voice_agent').par.Resetcostmeter Pulse

Default:: False

Conversation Header

Log Conversation (Enableconvdat) op('voice_agent').par.Enableconvdat Toggle

Default:: True

Gemini Live

API Key (Apikey) op('voice_agent').par.Apikey Str

Google AI Studio API key (routed via key_manager). Get one at https://aistudio.google.com/api-keys

Default:: "" (Empty String)

Model (Model) op('voice_agent').par.Model StrMenu

3.1 Flash Live: ~$0.30/min paid tier. 2.5 Native Audio: similar. 2.0 Flash Live: legacy, kept for parity.

Default:

"" (Empty String)

Menu Options:

Gemini 3.1 Flash Live (preview, newest) (gemini-3.1-flash-live-preview)
Gemini 2.5 Flash Native Audio (preview) (gemini-2.5-flash-native-audio-preview-12-2025)
Gemini 2.0 Flash Live (legacy) (gemini-2.0-flash-live-001)

Voice (Voice) op('voice_agent').par.Voice StrMenu

2026 Gemini Live voice roster. Voice names are fixed; no cloning.

Default:

Zephyr

Menu Options:

Zephyr (Zephyr)
Puck (Puck)
Charon (Charon)
Kore (Kore)
Fenrir (Fenrir)
Leda (Leda)
Orus (Orus)
Aoede (Aoede)
Achernar (Achernar)
Algenib (Algenib)
Algieba (Algieba)
Despina (Despina)
Erinome (Erinome)
Laomedeia (Laomedeia)
Rasalgethi (Rasalgethi)
Sadachbia (Sadachbia)
Sadaltager (Sadaltager)
Schedar (Schedar)
Sulafat (Sulafat)
Umbriel (Umbriel)
Vindemiatrix (Vindemiatrix)
Zubenelgenubi (Zubenelgenubi)

Language Code (Languagecode) op('voice_agent').par.Languagecode Str

BCP-47 language code for speech (optional). Leave blank to auto-detect.

Default:: "" (Empty String)

System Prompt (Systemprompt) op('voice_agent').par.Systemprompt Str

System instruction sent at session open via send_client_content.

Default:: "" (Empty String)

User Transcription (Enableusertranscription) op('voice_agent').par.Enableusertranscription Toggle

Emit partial + final user speech transcripts.

Default:: True

Assistant Transcription (Enableoutputtranscription) op('voice_agent').par.Enableoutputtranscription Toggle

Emit partial + final assistant speech transcripts.

Default:: True

Google Search Grounding (Enablegrounding) op('voice_agent').par.Enablegrounding Toggle

Enable Google Search tool for grounded answers. Not compatible with custom function tools in the same session.

Default:: False

Session Resumption (Enablesessionresumption) op('voice_agent').par.Enablesessionresumption Toggle

Server issues periodic resumption handles; provider reconnects within ~10 min on disconnect.

Default:: True

Thinking Budget (2.5) (Thinkingbudget) op('voice_agent').par.Thinkingbudget Int

Gemini 2.5 reasoning token budget. Ignored on 3.x.

Default:: 0
Range:: 0 to 8192
Slider Range:: 0 to 8192

Playback

Audio Device Settings Header

Reset Playback (Resetplayback) op('voice_agent').par.Resetplayback Pulse

Default:: False

Active (Playbackactive) op('voice_agent').par.Playbackactive Toggle

Default:: True

Threaded Device (Threadeddevice) op('voice_agent').par.Threadeddevice StrMenu

Default:

default

Menu Options:

default (default)

Volume (Volume) op('voice_agent').par.Volume Float

Default:: 1.0
Range:: 0 to 1
Slider Range:: 0 to 1

Transport Header

Play (Play) op('voice_agent').par.Play Pulse

Default:: False

Pause (Pause) op('voice_agent').par.Pause Pulse

Default:: False

Stop (Stop) op('voice_agent').par.Stop Pulse

Default:: False

Replay (Replay) op('voice_agent').par.Replay Pulse

Default:: False

Session Saving Header

Save Session Trace (Sessiontracing) op('voice_agent').par.Sessiontracing Toggle

Write a JSON trace of every session to disk on disconnect. Format: <tracedir>/<YYYYMMDD_HHMMSS>_<provider>_<model>.json. Holds transcript + resume handle + cost + timestamps. Required for Resume Last / Load Session.

Default:: True

Trace Folder (Sessiontracedir) op('voice_agent').par.Sessiontracedir Folder

Folder to write traces into. Blank = <project>/voice_sessions/.

Default:: "" (Empty String)

Resume Last Session (Resumelast) op('voice_agent').par.Resumelast Toggle

On Connect, auto-load the newest trace matching the active Provider+Model and resume. Native-resumption providers (Gemini, Hume) hand the server an opaque handle — server has the full state. Replay providers (OpenAI, xAI) re-feed the transcript, capped by Maxreplayrows; the model re-reads its prior turns — audio continuity is lost and token cost grows with transcript length.

Default:: False

Load Specific Trace (Loadsessionfile) op('voice_agent').par.Loadsessionfile File

Override Resume Last with a specific trace file. Loaded on the next Connect. Clear to return to newest-matching behavior.

Default:: "" (Empty String)

Max Replay Rows (Maxreplayrows) op('voice_agent').par.Maxreplayrows Int

Replay providers only. Caps how many prior messages are rehydrated into the new session. Higher = more context + higher cost per reconnect.

Default:: 20
Range:: 1 to 200
Slider Range:: 1 to 200

Resume Status (Sessionresume) op('voice_agent').par.Sessionresume Str

What happened on the most recent Connect: a native handle round-trip, a replay of N messages, or a fresh session.

Default:: "" (Empty String)

Session ID (Sessionid) op('voice_agent').par.Sessionid Str

Identifier of the active (or most recent) session. Matches the trace filename.

Default:: "" (Empty String)

Tools

Tool Configuration Header

Use Tools (Usetools) op('voice_agent').par.Usetools Toggle

Enable external tool operators via Tool sequence blocks

Default:: True

Built-in Tools Header

Allow end_conversation (Allowendconversation) op('voice_agent').par.Allowendconversation Toggle

Assistant can call end_conversation to hang up. It speaks its closing line first; the EXT disconnects on the tool call.

Default:: True

Allow output_text_content (Outputtext) op('voice_agent').par.Outputtext Toggle

Assistant can display text without speaking it aloud — useful for code, data, or long blocks.

Default:: False

Tool Approval Header

Approval Timeout (s) (Approvaltimeout) op('voice_agent').par.Approvaltimeout Int

Auto-deny after N seconds (0 = wait forever)

Default:: 0
Range:: 0 to 600
Slider Range:: 0 to 600

Pending (Pendingtools) op('voice_agent').par.Pendingtools Str

Default:: "" (Empty String)

Approve (Approvetools) op('voice_agent').par.Approvetools Pulse

Default:: False

Deny (Denytools) op('voice_agent').par.Denytools Pulse

Default:: False

Cost Budget Header

Cost Budget ($) (Costbudget) op('voice_agent').par.Costbudget Float

Session cost limit in USD (0 = unlimited). When exceeded, the session is disconnected and onError fires with source=budget.

Default:: 0.0
Range:: 0 to 10
Slider Range:: 0 to 10

Tool (Tool) op('voice_agent').par.Tool Sequence

Default:: 0

Tool OP (Tool0toolop) op('voice_agent').par.Tool0toolop OP

Default:: "" (Empty String)

Skills

Skills Header

Skills Folder (Skillsfolder) op('voice_agent').par.Skillsfolder Folder

Default:: "" (Empty String)

Skills COMP (Skillscomp) op('voice_agent').par.Skillscomp OP

Default:: "" (Empty String)

Scan Skills (Scanskills) op('voice_agent').par.Scanskills Pulse

Default:: False

Skillscount (Skillscount) op('voice_agent').par.Skillscount Str

Default:: "" (Empty String)

Profiles

Profiles Folder (Profilesfolder) op('voice_agent').par.Profilesfolder Folder

Default:: "" (Empty String)

Scan Profiles (Scanprofiles) op('voice_agent').par.Scanprofiles Pulse

Default:: False

Apply Profile Stack (Applyprofiles) op('voice_agent').par.Applyprofiles Pulse

Default:: False

Profile (Profile) op('voice_agent').par.Profile Sequence

Default:: 0

Profile Name (Profile0profilename) op('voice_agent').par.Profile0profilename StrMenu

Default:

"" (Empty String)

Menu Options:

(none) ((none))
annotate_assistant (annotate_assistant)
dev_tools (dev_tools)
groq_fast (groq_fast)
groq_llama70b (groq_llama70b)
local_chat (local_chat)
local_gemma (local_gemma)
model_config_qwen3 (model_config_qwen3)
td_navigator (td_navigator)
tool_agent (tool_agent)

Profile Name (Profile1profilename) op('voice_agent').par.Profile1profilename StrMenu

Default:

"" (Empty String)

Menu Options:

(none) ((none))
annotate_assistant (annotate_assistant)
dev_tools (dev_tools)
groq_fast (groq_fast)
groq_llama70b (groq_llama70b)
local_chat (local_chat)
local_gemma (local_gemma)
model_config_qwen3 (model_config_qwen3)
td_navigator (td_navigator)
tool_agent (tool_agent)

Profile Name (Profile2profilename) op('voice_agent').par.Profile2profilename StrMenu

Default:

"" (Empty String)

Menu Options:

(none) ((none))
annotate_assistant (annotate_assistant)
dev_tools (dev_tools)
groq_fast (groq_fast)
groq_llama70b (groq_llama70b)
local_chat (local_chat)
local_gemma (local_gemma)
model_config_qwen3 (model_config_qwen3)
td_navigator (td_navigator)
tool_agent (tool_agent)

Display Name (Displayname) op('voice_agent').par.Displayname Str

Friendly name for UI, dashboards, event sinks, and agent swarm traces. Profiles may set this value.

Default:: "" (Empty String)

Display Color (Displaycolorr) op('voice_agent').par.Displaycolorr RGB

Identity color for the operator tile, compact panels, dashboards, and profile-driven UI.

Default:: 0.98
Range:: 0 to 1
Slider Range:: 0 to 1

Display Color (Displaycolorg) op('voice_agent').par.Displaycolorg RGB

Identity color for the operator tile, compact panels, dashboards, and profile-driven UI.

Default:: 0.52
Range:: 0 to 1
Slider Range:: 0 to 1

Display Color (Displaycolorb) op('voice_agent').par.Displaycolorb RGB

Identity color for the operator tile, compact panels, dashboards, and profile-driven UI.

Default:: 0.02
Range:: 0 to 1
Slider Range:: 0 to 1

Callbacks

Callbacks DAT (Callbackdat) op('voice_agent').par.Callbackdat DAT

Default:: "" (Empty String)

Print Callbacks (Printcallbacks) op('voice_agent').par.Printcallbacks Toggle

Default:: False

Lifecycle

Dependencies OK (Installdependencies) op('voice_agent').par.Installdependencies Pulse

Default:: False

Initialize Engine (Initialize) op('voice_agent').par.Initialize Pulse

Default:: False

Shutdown Engine (Shutdown) op('voice_agent').par.Shutdown Pulse

Default:: False

Engine Status (Enginestatus) op('voice_agent').par.Enginestatus Str

Default:: "" (Empty String)

Active (Active) op('voice_agent').par.Active Toggle

Default:: False

Changelog

v1.0.02026-05-02

Release update

v0.3.0

# 0.3.0

Session refresh coordinator: tracks MAX_SESSION_S per provider, fires auto-refresh at cap and on provider GoAway
Per-RESUMPTION routing — native handle replay (Gemini Live, Hume EVI), transcript replay via prime_history (OpenAI Realtime, xAI Grok), or clean expiry (none)
Autosessionrefresh bool + Refreshwarning int on Voice page; onStatus(expiring|refreshing|expired) callbacks around refresh lifecycle
Conversationcost readout accumulates USD across refreshes; Sessioncost resets per leg; Costbudget enforced against conversation total
Sessionstate badge gains refreshing; mic pump gated off during refresh; in-flight user audio dropped (v1)
provider_hume_evi RESUMPTION flipped from replay to native to match wire behavior (chat_group_id carries conversation memory server-side)
docs/guide.md: Session refresh section replaces "not yet active" known-limitation

v0.2.0

# 0.2.0

First shipped provider: Gemini Live. Built fresh against the locked Group B interface — does not port the gemini_live monolith. Addresses every audit finding from notes/realtime_voice_primitive/001_session_log.md.

operator/provider_gemini_live.py:

PROVIDER_TYPE='realtime', TRANSPORT='ws', KEY_SERVER='gemini', RESUMPTION='native', SUPPORTS_VIDEO_IN=True, FULL_DUPLEX=False, MAX_SESSION_S=900, SAMPLE_RATE_IN=16000, SAMPLE_RATE_OUT=24000, FRAME_MS=None, DEPENDENCIES=['google-genai'].
Model menu: gemini-3.1-flash-live-preview (default), gemini-2.5-flash-native-audio-preview-12-2025, gemini-2.0-flash-live-001.
2026 voice roster (22 voices): Achernar, Algenib, Algieba, Aoede, Charon, Despina, Erinome, Fenrir, Kore, Laomedeia, Leda, Orus, Puck, Rasalgethi, Sadachbia, Sadaltager, Schedar, Sulafat, Umbriel, Vindemiatrix, Zephyr, Zubenelgenubi.
pricing(model_id) returns normalized per-minute costs with tier_unverified=True (free-tier reachability is the open question in 001 → Group G).

Audit fixes baked in:

Mid-session input uses send_realtime_input(audio=…) / (text=…) / (video=…). No session.send(...) wrapper anywhere.
Tool responses never send scheduling=NON_BLOCKING; sync-only across 3.x and 2.5 by design (3.x footgun).
thinking_config plumbed: thinking_level (3.x) or thinking_budget (2.5), auto-selected per model family.
turn_coverage set explicitly on the RealtimeInputConfig (default TURN_INCLUDES_AUDIO_ACTIVITY_AND_ALL_VIDEO).
Push-to-talk turn mode sets activity_handling=START_OF_ACTIVITY_INTERRUPTS for consistent barge-in.
session_resumption gated by Enablesessionresumption toggle (not always-on).
GoAway surfaced via on_status('goaway', …) instead of being swallowed.
send_audio_frame asserts bytes + int16-aligned length; no silent float32 → 1007 close.
Canonical async for message in session.receive() loop; no manual while self.conversation_active receive walk.
Session state (socket, receive task, resumption handle) lives on the GeminiConn object — module stays stateless.

Tool schema conversion: OpenAI-style tool declarations from ToolManager.parse_tools are wrapped in types.Tool(function_declarations=[types.FunctionDeclaration(...)]). Google Search grounding is a separate types.Tool(google_search=...) added when Enablegrounding is on.
EXT: _collect_session_pars now also passes tools (list of OpenAI-style tool definitions) via the reserved pars['tools'] key. Provider template comment updated to document the reserved key.
Cost ballpark: Gemini 3.1 paid tier ~\$0.005/min in + \$0.018/min out audio (~\$18/hr voice chat). Free tier reachability via standard API key unverified.

v0.1.0

# 0.1.0

Initial scaffold of the unified realtime voice-to-voice operator.

Submodule created with extends ["util-base-lop", "util-agent-core", "util-speech-template", "util-chained-callbacks"].
VoiceRealtimeEXT subclasses SpeechTemplate (SPEECH_TYPE='voice-realtime') and mixes in ChainedCallbacksExt.

Manually invokes the template's TTS playback infra (_setup_tts_outputs, _setup_playback_parameters) so speaker-out works without editing util-speech-template.
Wires ProviderRegistry(owner, 'voice-realtime'), VoiceRealtimeCallbacks (extends ProviderCallbacks with on_tool_call / on_user_speech_started / on_user_speech_ended), ToolManager from util-agent-core.

Base parameters: Provider, Scanproviders, Providersfolder, Initialize, Shutdown, Active, Enginestatus, Micin (CHOP), Inputtext, Sendtext, Pricing. Endpointurl auto-added when the active provider declares KEY_SERVER=None.
Async session lifecycle: Initialize → provider.start_session(pars, callbacks) via TDAsyncIO; mic pump drains audio_buffer (filled by inherited ReceiveAudioChunk) into provider.send_audio_frame at the provider's FRAME_MS cadence; provider audio routes to the reused TTS playback chain.
Tool calls dispatch through ToolManager (Tool sequence parameters created manually in TD per the agent operator convention) with results routed back via provider.send_tool_response.
Callbacks page: Callbackdat, Printcallbacks. Callbacks fired: onSessionStart, onSessionEnd, onAssistantText, onUserText, onToolCall, onAudioIn, onAudioOut, onProviderChange, onError.
provider_template.py documents the locked Group B interface: required constants (PROVIDER_TYPE='realtime', SPEECH_TYPE='voice-realtime', TRANSPORT, SAMPLE_RATE_IN/OUT, FRAME_MS, MAX_SESSION_S, RESUMPTION, SUPPORTS_VIDEO_IN, SUPPORTS_VOICE_CLONING, FULL_DUPLEX, KEY_SERVER, DEPENDENCIES); helpers get_parameters, pricing(model_id), voices, is_available; async API start_session, send_audio_frame, send_text, send_tool_response, end_session, optional send_video_frame.
No providers ship in this version. First provider (Gemini Live) lands in 0.2.0 (Group D).

Key Features

Providers at a glance

Requirements

Account & billing

Models & Pricing

Gemini Live

OpenAI Realtime

xAI Grok Voice

Hume EVI

Input/Output

Inputs

Outputs

Session modes (Voice page)

Session resumption

Tool Integration

Connecting external tools

Built-in tools

Streaming modes (Voice page)

Cost control

Profiles & Skills

Callbacks

Usage Examples

Basic voice conversation

Resuming the last session

Budget-capped demo

Session refresh

Known limitations

Troubleshooting

Parameters

Voice

Gemini Live

Playback

Tools

Skills

Profiles

Callbacks

Lifecycle

Changelog

Related Operators