Skip to content
  1. OPERATORS
  2. CONTROLLERS

Voice Agent

v1.0.0new

The Voice Realtime LOP is a single operator that talks to any realtime voice-to-voice provider. Provider modules (provider_gemini_live.py, provider_openai_realtime.py, provider_xai_grok.py, provider_hume_evi.py) drop into the operator like TTS/STT providers — swap backends from the Provider menu, no per-provider operator required. It replaces the older gemini_live monolith (now deprecated) and mirrors the unified-operator pattern used by tts and stt_*.

  • One operator, four cloud providers — switch backends from a single menu
  • Session modes: continuous, one_turn, push_to_talk
  • Session resumption: native handle (Gemini, Hume) or transcript replay (OpenAI, xAI)
  • Disk-persisted session history with optional auto-resume
  • External tool orchestration via the same Tool sequence used by the Agent LOP
  • Built-in tools: end_conversation and output_text_content
  • Live per-minute cost ballpark + running session cost + optional Costbudget cap
  • Live-streaming user transcript (row rewrites in place as the user speaks)
  • Profiles + Skills injection into the system prompt
  • Affect / emotion signals (Hume EVI — 48 prosody dimensions on a dedicated signals CHOP)
ProviderModel familiesAudio outToolsNotable
Gemini Livegemini-3.1-flash-live-preview, gemini-2.5-flash-native-audio-preview-12-202524 kHzSync only on 3.x, async on 2.5Native session resumption, video-in, Google Search grounding
OpenAI Realtimegpt-realtime, gpt-realtime-mini24 kHzStreamed (one item per call)Token-metered idle, long session cap
xAI Grok Voicegrok-voice-*24 kHzStreamedFlat per-minute pricing (wallclock-metered)
Hume EVIEVI 348 kHzStreamedProsody/affect side-channel, voice cloning supported

All providers use the same interface — code written for one works for all four.

  • API keys flow through ChatTD’s Key Manager. Store a key per provider under its server name (gemini, openai, xai, hume) or paste it into the Apikey parameter on the provider sub-page.
  • Python dependencies — declared per provider in its DEPENDENCIES constant. The first time you switch to a provider whose deps are missing, the backend page surfaces an install pulse. Gemini needs google-genai, OpenAI needs openai, xAI and Hume use raw websockets (already pinned).

Realtime voice can get expensive fast — 10 minutes of continuous voice-to-voice on Gemini 3.1 runs ~$2.50 at current paid-tier rates. Check your tier and set Costbudget before a long session.

ProviderTier URLPricing reference
Gemini Live<https://aistudio.google.com/usage><https://ai.google.dev/gemini-api/docs/pricing>
OpenAI Realtime<https://platform.openai.com/usage><https://openai.com/api/pricing/>
xAI Grok<https://console.x.ai/usage><https://x.ai/api#pricing>
Hume EVI<https://beta.hume.ai/settings/billing><https://www.hume.ai/pricing>

The Pricing parameter on the Voice page shows the current provider + model ballpark as soon as you select them (e.g. ~in 0.005 USD/min out 0.018 USD/min). Sessioncost accumulates live as the session runs. Pulse Resetcostmeter to zero it.

Model IDStatusAudio in/outVideo inNotes
gemini-3.1-flash-live-previewPreview (newest, default)~$0.005 / $0.018 per min~$0.002/minAcoustic nuance, thinkingLevel, sync function calling
gemini-2.5-flash-native-audio-preview-12-2025Preview~$0.005 / $0.018 per min~$0.005/minNative audio, async tools

Voice-chat ballpark on 3.1: ~$0.30/min continuous voice-to-voice.

Model IDStatusAudio in/outNotes
gpt-realtimeGAToken-meteredLong session cap, native tool interruption
gpt-realtime-miniGAToken-metered, cheaperLower quality, same interface

Flat ~$0.05/min wallclock (idle time bills). Use one_turn mode to avoid paying for dead air.

~$0.04–0.07/min wallclock depending on voice. 30-minute session cap. Ships an onAffect callback with per-turn prosody scores (Joy, Surprise, Admiration, etc.).

The Model parameter is an editable menu — type a custom ID if a provider ships a new model before this operator is updated.

  • Input 1 (Audio CHOP): Microphone audio. Typically a mono CHOP from an Audio Device In, fed to the operator via Micin on the Voice page. The EXT resamples to each provider’s required rate automatically (SAMPLE_RATE_IN constant per provider).
  • Output 1: Conversation table DAT (role, message, id, timestamp, type, metadata, session_id)
  • Output 2: Current audio playback CHOP (store_output)
  • Output 3: Full session audio CHOP (full_audio)
  • Output 4: Text output DAT (content from the output_text_content tool, when enabled)
  • signals CHOP: Common channels (connected, model_ready, worker_active, cost_in_seconds, cost_out_seconds) plus any provider-specific channels declared via that provider’s SIGNAL_CHANNELS dict (Hume: 48 affect dimensions prefixed hume_evi_affect_*).
  • continuous (default): Connect opens the session and keeps it alive full-duplex until Disconnect or the end_conversation tool fires.
  • one_turn: Connect opens the session for one exchange. After the assistant’s first turn-final text the EXT either holds the socket and disarms the mic (token-metered providers — Gemini, OpenAI) or disconnects and writes the trace (wallclock-metered — xAI, Hume). The next Connect re-arms. Use for discrete voice prompts when you don’t want to pay for idle time.
  • push_to_talk: Session stays open, but the Talk toggle gates whether mic audio flows. Bind Talk to a Keyboard In or MIDI In CHOP for walkie-talkie-style interactions.

The Sessionstate readout shows where the session is: disconnected / connecting / active / armed / ending.

On disconnect the EXT writes a sibling JSON trace (voice_&lt;timestamp&gt;_&lt;hash&gt;.json) to Sessiontracedir (defaults to project.folder/voice_sessions/). The trace holds the resume handle (provider-specific), the transcript, and the end reason.

On the next connect the EXT picks a resume source in this order:

  1. Loadsessionfile (file path) — explicit one-shot
  2. Resumelast (toggle) — newest trace in the directory
  3. None — starts fresh

Resumption strategy is per-provider:

  • Gemini Live / Hume EVI → native handle. Zero replay cost.
  • OpenAI Realtime / xAI Grok → transcript replay (last Maxreplayrows messages, default 20, user+assistant only). Replay cost grows with history — the Sessionresume readout says so when replay fires.

Tool-call / tool-result rows are dropped from replay to avoid lying to the model about output it didn’t produce.

Voice Realtime consumes tools from other LOPs using the same pattern as the Agent LOP. It does not expose a GetTool() method.

  1. On the Tools page, enable Use LOP Tools.
  2. In the External Op Tools sequence, add a block and drag the tool operator into the OP field.
  3. Set Mode per tool:
    • enabled — blocks until the tool completes before the model continues.
    • enabled_nonblocking — fires and forgets. Safe on Gemini 2.5 and OpenAI; on Gemini 3.x the model runs sync regardless.
    • disabled — skipped.
  4. Connect the session. The model calls tools as needed and folds the results into its response.
  • end_conversation — on when Allow model to end conversation is enabled. The model can close gracefully on goodbyes.
  • output_text_content — on when Output text is enabled. The model can display text (code, data, URLs) in the fourth output DAT without reading it aloud.
  • Google Search grounding (Gemini only) — add google_search as a tool in Enablegrounding on the Gemini Live page.

Tool-call rendering in chat_viewer is automatic: paired tool_call / tool_result rows collapse into a single expandable entry by metadata.call_id.

Streamingmode controls how assistant/user transcripts land in the conversation DAT:

  • live (default): one row per turn, rewritten in place as deltas arrive. Best UX for live captions. chat_viewer re-renders in place via stable row ids.
  • coalesce: one row per turn, written only on turn-final. Cleanest log; no streaming jitter.
  • append: one row per delta. Debug-heavy. Avoid for long sessions.

xAI Grok emits no user-delta stream — on xAI, live degrades to coalesce automatically for user text.

  • Pricing — per-minute ballpark for the active provider + model, refreshed on change.
  • Sessioncost — running session spend (accumulated via SAMPLE_RATE + audio seconds × provider pricing).
  • Costbudget — hard cap in USD. When Sessioncost exceeds Costbudget the EXT disconnects and fires onError with source='budget'. Set to 0 to disable.
  • Resetcostmeter — pulse to zero the session cost meter (does not reset the budget).
  • Profiles page — scan a folder of JSON profile files, pick one from the menu, the system prompt + model + voice + tool toggles apply on connect.
  • Skills page — scan a folder of JSON skills, each skill’s system-prompt chunk is appended to the session instructions.

Both pages mirror the agent LOP’s layout and share the same profile/skill file format.

Wire custom logic on the Callbacks page. The Callbackdat textDAT receives a stub with every callback signature: onSessionStart, onSessionEnd, onAssistantText, onUserText, onToolCall, onToolResult, onAudioIn, onAudioOut, onProviderChange, onError, onAffect (Hume only), onUserSpeechStarted / onUserSpeechEnded where the provider supplies them.

Toggle Printcallbacks to log every callback fire to the textport while developing.

  1. Select Provider on the Voice page. Pulse Scanproviders if the menu is empty.
  2. On the provider sub-page, pick a Model and Voice. Check the Pricing readout on the Voice page.
  3. Paste an API key into Apikey (or store it under the provider’s server name in ChatTD Key Manager).
  4. Pulse Connect. Watch Sessionstate flip to active.
  5. Speak into the mic. Assistant audio plays through the Playback-page device.
  6. Pulse Disconnect when done — the session trace is written.
  1. Enable Resumelast on the Playback page before connecting.
  2. Pulse Connect. The Sessionresume readout shows which path fired (Resumed via native handle (gemini_live) or Replayed 20 messages (replay, cost grows with history)).
  3. The conversation DAT pre-populates with the previous transcript; the provider is handed either the resume token or the replayed messages.
  1. Set Costbudget = 0.50 on the Tools page.
  2. Connect and converse.
  3. The EXT disconnects the moment spend crosses $0.50 and writes an error row to the conversation DAT.

The EXT auto-refreshes sessions as they approach the provider’s MAX_SESSION_S cap (Gemini: 900s, OpenAI: 3600s, xAI: 3600s, Hume: 1800s), or immediately on provider-emitted goaway. Controls on the Voice page:

  • Auto-Refresh Session (default on) — arm the deadline coordinator.
  • Refresh Warning (s) (default 30) — seconds before cap at which onStatus('expiring') fires. Set 0 to disable the warning and refresh only at cap.

Per-provider behavior routes off RESUMPTION:

  • Native (Gemini Live, Hume EVI) — resume handle captured via get_persistable_state is re-injected into the new start_session. The server carries the full history; no client-side replay.
  • Replay (OpenAI Realtime, xAI Grok) — new session primed via prime_history with the shaped transcript, capped by Maxreplayrows. Token cost grows with transcript length.
  • None — session ends cleanly; onSessionEnd fires with end_reason='cap' and onStatus('expired') follows. No reconnect.

During a refresh the Session State badge reads refreshing, the mic pump is paused, and a system row ▶ Refreshed session (&lt;mode&gt;) — reason: &lt;cap|goaway&gt; is appended to the conversation DAT. Conversation Cost accumulates across refreshes; Session Cost resets per leg. Costbudget is enforced against the conversation-wide total.

In-flight user audio at the moment of refresh is dropped (v1) — the audio buffer isn’t carried across the reconnect. Speaker-out finishes its current buffer since playback is decoupled from the session.

  • Voice cloning UI is not implemented in v1 even though Hume declares SUPPORTS_VOICE_CLONING=True.
  • xAI Grok user-text deltas are not emitted by the provider — only final user transcripts land in the DAT.
  • Sessioncost stuck at $0.00000: the active provider’s pricing(model_id) returned nothing for the selected model. Verify the Model ID is in the provider’s pricing map.
  • Mic audio not flowing: check Sessionstate. If it’s armed, the gate is closed — you’re in push_to_talk without Talk on, or in one_turn after the first reply. Pulse Connect to re-arm.
  • “No key for server ‘gemini’”: open ChatTD Key Manager and add a key under the server name, or paste into Apikey on the provider sub-page.
  • 1007 / 1008 close codes on Gemini: usually a dtype or rate mismatch on mic input. The provider asserts int16 little-endian and rate = SAMPLE_RATE_IN — upstream resampling should handle it, but check the mic CHOP is mono.
  • Replay costs a lot: lower Maxreplayrows or switch to a native-resumption provider (Gemini, Hume). Replay cost scales with transcript length.
Session Mode (Sessionmode) op('voice_agent').par.Sessionmode Menu

continuous: mic is hot full-duplex until Disconnect or end_conversation. one_turn: Connect arms for exactly one exchange — after the assistant finishes responding, mic auto-disarms (token-metered providers hold the socket open; wallclock-metered providers disconnect to avoid idle cost). Hit Connect again to arm the next turn. push_to_talk: socket stays open; mic only sends audio while the Talk toggle is on — bind it to keyboard/MIDI.

Default:
continuous
Options:
continuous, one_turn, push_to_talk
Connect (Connect) op('voice_agent').par.Connect Pulse
Default:
False
Disconnect (Disconnect) op('voice_agent').par.Disconnect Pulse
Default:
False
Talk (PTT) (Talk) op('voice_agent').par.Talk Toggle

Push-to-talk gate. Only used when Session Mode = push_to_talk. True → mic audio streams to provider; False → mic is muted (session stays open).

Default:
False
Session State (Sessionstate) op('voice_agent').par.Sessionstate Str

disconnected | connecting | active | armed | refreshing | ending. "armed" = session open, mic gated off (one_turn waiting for next arm, PTT with Talk=off, or Active=off). "refreshing" = session refresh coordinator is tearing down and restarting around MAX_SESSION_S / GoAway.

Default:
"" (Empty String)
Auto-Refresh Session (Autosessionrefresh) op('voice_agent').par.Autosessionrefresh Toggle

Auto-refresh the session when the provider's MAX_SESSION_S cap is approached or on provider GoAway. Native-resumption providers (Gemini, Hume) carry an opaque handle across the refresh. Replay providers (OpenAI, xAI) rehydrate the transcript via prime_history. Turn off for clean expiry — the session ends, onSessionEnd fires with end_reason=cap.

Default:
True
Refresh Warning (s) (Refreshwarning) op('voice_agent').par.Refreshwarning Int

Seconds before the session cap at which to emit onStatus(expiring). Set 0 to disable the warning and refresh only at cap.

Default:
30
Range:
0 to 300
Slider Range:
0 to 300
Provider (Provider) op('voice_agent').par.Provider StrMenu
Default:
gemini_live
Menu Options:
  • Hume EVI (hume_evi)
  • xAI Grok Voice (xai_grok)
  • Gemini Live (gemini_live)
  • OpenAI Realtime (openai_realtime)
Scan Providers (Scanproviders) op('voice_agent').par.Scanproviders Pulse
Default:
False
Custom Providers Folder (Providersfolder) op('voice_agent').par.Providersfolder Folder
Default:
"" (Empty String)
Mic In (CHOP) (Micin) op('voice_agent').par.Micin CHOP
Default:
"" (Empty String)
Text Input (Inputtext) op('voice_agent').par.Inputtext Str
Default:
"" (Empty String)
Send Text (Sendtext) op('voice_agent').par.Sendtext Pulse
Default:
False
Pricing (Pricing) op('voice_agent').par.Pricing Str
Default:
"" (Empty String)
Conversation Cost (Conversationcost) op('voice_agent').par.Conversationcost Str

Running USD total across all session refreshes in the current conversation. Resets on Connect (fresh session) or Reset Cost Meter, but preserved across auto-refresh events.

Default:
"" (Empty String)
Session Cost (Sessioncost) op('voice_agent').par.Sessioncost Str
Default:
"" (Empty String)
Reset Cost Meter (Resetcostmeter) op('voice_agent').par.Resetcostmeter Pulse
Default:
False
Conversation Header
Log Conversation (Enableconvdat) op('voice_agent').par.Enableconvdat Toggle

Append transcript rows to the internal conversation DAT (role | message | id | timestamp | type | metadata | session_id). chat_viewer reads from this.

Default:
True
Streaming Mode (Streamingmode) op('voice_agent').par.Streamingmode Menu

live: one row per turn, message cell rewritten in place as deltas stream — smooth realtime UI, falls back to coalesce for providers that emit no deltas (e.g. xai_grok user text). coalesce: write one text row per completed turn. append: write every streamed delta as its own row.

Default:
live
Options:
live, coalesce, append
API Key (Apikey) op('voice_agent').par.Apikey Str

Google AI Studio API key (routed via key_manager). Get one at https://aistudio.google.com/api-keys

Default:
"" (Empty String)
Model (Model) op('voice_agent').par.Model StrMenu

3.1 Flash Live: ~$0.30/min paid tier. 2.5 Native Audio: similar. 2.0 Flash Live: legacy, kept for parity.

Default:
"" (Empty String)
Menu Options:
  • Gemini 3.1 Flash Live (preview, newest) (gemini-3.1-flash-live-preview)
  • Gemini 2.5 Flash Native Audio (preview) (gemini-2.5-flash-native-audio-preview-12-2025)
  • Gemini 2.0 Flash Live (legacy) (gemini-2.0-flash-live-001)
Voice (Voice) op('voice_agent').par.Voice StrMenu

2026 Gemini Live voice roster. Voice names are fixed; no cloning.

Default:
Zephyr
Menu Options:
  • Zephyr (Zephyr)
  • Puck (Puck)
  • Charon (Charon)
  • Kore (Kore)
  • Fenrir (Fenrir)
  • Leda (Leda)
  • Orus (Orus)
  • Aoede (Aoede)
  • Achernar (Achernar)
  • Algenib (Algenib)
  • Algieba (Algieba)
  • Despina (Despina)
  • Erinome (Erinome)
  • Laomedeia (Laomedeia)
  • Rasalgethi (Rasalgethi)
  • Sadachbia (Sadachbia)
  • Sadaltager (Sadaltager)
  • Schedar (Schedar)
  • Sulafat (Sulafat)
  • Umbriel (Umbriel)
  • Vindemiatrix (Vindemiatrix)
  • Zubenelgenubi (Zubenelgenubi)
Language Code (Languagecode) op('voice_agent').par.Languagecode Str

BCP-47 language code for speech (optional). Leave blank to auto-detect.

Default:
"" (Empty String)
System Prompt (Systemprompt) op('voice_agent').par.Systemprompt Str

System instruction sent at session open via send_client_content.

Default:
"" (Empty String)
Turn Mode (Turnmode) op('voice_agent').par.Turnmode Menu

Auto VAD uses server VAD. Push to Talk disables server VAD and enables START_OF_ACTIVITY_INTERRUPTS for consistent barge-in.

Default:
auto_vad
Options:
auto_vad, push_to_talk
User Transcription (Enableusertranscription) op('voice_agent').par.Enableusertranscription Toggle

Emit partial + final user speech transcripts.

Default:
True
Assistant Transcription (Enableoutputtranscription) op('voice_agent').par.Enableoutputtranscription Toggle

Emit partial + final assistant speech transcripts.

Default:
True
Google Search Grounding (Enablegrounding) op('voice_agent').par.Enablegrounding Toggle

Enable Google Search tool for grounded answers. Not compatible with custom function tools in the same session.

Default:
False
Session Resumption (Enablesessionresumption) op('voice_agent').par.Enablesessionresumption Toggle

Server issues periodic resumption handles; provider reconnects within ~10 min on disconnect.

Default:
True
Thinking Level (3.x) (Thinkinglevel) op('voice_agent').par.Thinkinglevel Menu

Gemini 3.x reasoning budget. Ignored on 2.5.

Default:
minimal
Options:
minimal, low, medium, high
Thinking Budget (2.5) (Thinkingbudget) op('voice_agent').par.Thinkingbudget Int

Gemini 2.5 reasoning token budget. Ignored on 3.x.

Default:
0
Range:
0 to 8192
Slider Range:
0 to 8192
Turn Coverage (Turncoverage) op('voice_agent').par.Turncoverage Menu

3.1 default covers all input. 2.5 default drops input silently outside activity — set explicitly to avoid surprise.

Default:
TURN_INCLUDES_ALL_INPUT
Options:
TURN_INCLUDES_ALL_INPUT, TURN_INCLUDES_ONLY_ACTIVITY
Playback Mode (Playbackmode) op('voice_agent').par.Playbackmode Menu
Default:
threaded
Options:
chop, threaded
Audio Device Settings Header
Reset Playback (Resetplayback) op('voice_agent').par.Resetplayback Pulse
Default:
False
Active (Playbackactive) op('voice_agent').par.Playbackactive Toggle
Default:
True
Driver (Driver) op('voice_agent').par.Driver Menu
Default:
default
Options:
default, asio
Device (Audiodevice) op('voice_agent').par.Audiodevice Menu
Default:
default
Options:
default
Threaded Device (Threadeddevice) op('voice_agent').par.Threadeddevice StrMenu
Default:
default
Menu Options:
  • default (default)
Volume (Volume) op('voice_agent').par.Volume Float
Default:
1.0
Range:
0 to 1
Slider Range:
0 to 1
Transport Header
Play (Play) op('voice_agent').par.Play Pulse
Default:
False
Pause (Pause) op('voice_agent').par.Pause Pulse
Default:
False
Stop (Stop) op('voice_agent').par.Stop Pulse
Default:
False
Replay (Replay) op('voice_agent').par.Replay Pulse
Default:
False
Session Saving Header
Save Session Trace (Sessiontracing) op('voice_agent').par.Sessiontracing Toggle

Write a JSON trace of every session to disk on disconnect. Format: <tracedir>/<YYYYMMDD_HHMMSS>_<provider>_<model>.json. Holds transcript + resume handle + cost + timestamps. Required for Resume Last / Load Session.

Default:
True
Trace Folder (Sessiontracedir) op('voice_agent').par.Sessiontracedir Folder

Folder to write traces into. Blank = <project>/voice_sessions/.

Default:
"" (Empty String)
Resume Last Session (Resumelast) op('voice_agent').par.Resumelast Toggle

On Connect, auto-load the newest trace matching the active Provider+Model and resume. Native-resumption providers (Gemini, Hume) hand the server an opaque handle — server has the full state. Replay providers (OpenAI, xAI) re-feed the transcript, capped by Maxreplayrows; the model re-reads its prior turns — audio continuity is lost and token cost grows with transcript length.

Default:
False
Load Specific Trace (Loadsessionfile) op('voice_agent').par.Loadsessionfile File

Override Resume Last with a specific trace file. Loaded on the next Connect. Clear to return to newest-matching behavior.

Default:
"" (Empty String)
Max Replay Rows (Maxreplayrows) op('voice_agent').par.Maxreplayrows Int

Replay providers only. Caps how many prior messages are rehydrated into the new session. Higher = more context + higher cost per reconnect.

Default:
20
Range:
1 to 200
Slider Range:
1 to 200
Resume Status (Sessionresume) op('voice_agent').par.Sessionresume Str

What happened on the most recent Connect: a native handle round-trip, a replay of N messages, or a fresh session.

Default:
"" (Empty String)
Session ID (Sessionid) op('voice_agent').par.Sessionid Str

Identifier of the active (or most recent) session. Matches the trace filename.

Default:
"" (Empty String)
Tool Configuration Header
Use Tools (Usetools) op('voice_agent').par.Usetools Toggle

Enable external tool operators via Tool sequence blocks

Default:
True
Built-in Tools Header
Allow end_conversation (Allowendconversation) op('voice_agent').par.Allowendconversation Toggle

Assistant can call end_conversation to hang up. It speaks its closing line first; the EXT disconnects on the tool call.

Default:
True
Allow output_text_content (Outputtext) op('voice_agent').par.Outputtext Toggle

Assistant can display text without speaking it aloud — useful for code, data, or long blocks.

Default:
False
Tool Approval Header
Approval Mode (Toolapproval) op('voice_agent').par.Toolapproval Menu

Gate tool execution behind user approval

Default:
off
Options:
off, all, destructive, unknown
Approval Timeout (s) (Approvaltimeout) op('voice_agent').par.Approvaltimeout Int

Auto-deny after N seconds (0 = wait forever)

Default:
0
Range:
0 to 600
Slider Range:
0 to 600
Pending (Pendingtools) op('voice_agent').par.Pendingtools Str
Default:
"" (Empty String)
Approve (Approvetools) op('voice_agent').par.Approvetools Pulse
Default:
False
Deny (Denytools) op('voice_agent').par.Denytools Pulse
Default:
False
Cost Budget Header
Cost Budget ($) (Costbudget) op('voice_agent').par.Costbudget Float

Session cost limit in USD (0 = unlimited). When exceeded, the session is disconnected and onError fires with source=budget.

Default:
0.0
Range:
0 to 10
Slider Range:
0 to 10
Tool (Tool) op('voice_agent').par.Tool Sequence
Default:
0
Tool OP (Tool0toolop) op('voice_agent').par.Tool0toolop OP
Default:
"" (Empty String)
Active (Tool0toolactive) op('voice_agent').par.Tool0toolactive Menu
Default:
enabled
Options:
off, enabled, forced
Skills Header
Skills Folder (Skillsfolder) op('voice_agent').par.Skillsfolder Folder
Default:
"" (Empty String)
Skills COMP (Skillscomp) op('voice_agent').par.Skillscomp OP
Default:
"" (Empty String)
Scan Skills (Scanskills) op('voice_agent').par.Scanskills Pulse
Default:
False
Skillscount (Skillscount) op('voice_agent').par.Skillscount Str
Default:
"" (Empty String)
Profiles Folder (Profilesfolder) op('voice_agent').par.Profilesfolder Folder
Default:
"" (Empty String)
Scan Profiles (Scanprofiles) op('voice_agent').par.Scanprofiles Pulse
Default:
False
Apply Profile Stack (Applyprofiles) op('voice_agent').par.Applyprofiles Pulse
Default:
False
Profile (Profile) op('voice_agent').par.Profile Sequence
Default:
0
Profile Name (Profile0profilename) op('voice_agent').par.Profile0profilename StrMenu
Default:
"" (Empty String)
Menu Options:
  • (none) ((none))
  • annotate_assistant (annotate_assistant)
  • dev_tools (dev_tools)
  • groq_fast (groq_fast)
  • groq_llama70b (groq_llama70b)
  • local_chat (local_chat)
  • local_gemma (local_gemma)
  • model_config_qwen3 (model_config_qwen3)
  • td_navigator (td_navigator)
  • tool_agent (tool_agent)
Profile Name (Profile1profilename) op('voice_agent').par.Profile1profilename StrMenu
Default:
"" (Empty String)
Menu Options:
  • (none) ((none))
  • annotate_assistant (annotate_assistant)
  • dev_tools (dev_tools)
  • groq_fast (groq_fast)
  • groq_llama70b (groq_llama70b)
  • local_chat (local_chat)
  • local_gemma (local_gemma)
  • model_config_qwen3 (model_config_qwen3)
  • td_navigator (td_navigator)
  • tool_agent (tool_agent)
Profile Name (Profile2profilename) op('voice_agent').par.Profile2profilename StrMenu
Default:
"" (Empty String)
Menu Options:
  • (none) ((none))
  • annotate_assistant (annotate_assistant)
  • dev_tools (dev_tools)
  • groq_fast (groq_fast)
  • groq_llama70b (groq_llama70b)
  • local_chat (local_chat)
  • local_gemma (local_gemma)
  • model_config_qwen3 (model_config_qwen3)
  • td_navigator (td_navigator)
  • tool_agent (tool_agent)
Display Name (Displayname) op('voice_agent').par.Displayname Str

Friendly name for UI, dashboards, event sinks, and agent swarm traces. Profiles may set this value.

Default:
"" (Empty String)
Display Color (Displaycolorr) op('voice_agent').par.Displaycolorr RGB

Identity color for the operator tile, compact panels, dashboards, and profile-driven UI.

Default:
0.98
Range:
0 to 1
Slider Range:
0 to 1
Display Color (Displaycolorg) op('voice_agent').par.Displaycolorg RGB

Identity color for the operator tile, compact panels, dashboards, and profile-driven UI.

Default:
0.52
Range:
0 to 1
Slider Range:
0 to 1
Display Color (Displaycolorb) op('voice_agent').par.Displaycolorb RGB

Identity color for the operator tile, compact panels, dashboards, and profile-driven UI.

Default:
0.02
Range:
0 to 1
Slider Range:
0 to 1
UI Behavior (Uibehavior) op('voice_agent').par.Uibehavior Menu

Controls compact UI animation intensity. Profiles may set this value.

Default:
simple
Options:
off, simple, expressive
Callbacks DAT (Callbackdat) op('voice_agent').par.Callbackdat DAT
Default:
"" (Empty String)
Print Callbacks (Printcallbacks) op('voice_agent').par.Printcallbacks Toggle
Default:
False
Dependencies OK (Installdependencies) op('voice_agent').par.Installdependencies Pulse
Default:
False
Initialize Engine (Initialize) op('voice_agent').par.Initialize Pulse
Default:
False
Shutdown Engine (Shutdown) op('voice_agent').par.Shutdown Pulse
Default:
False
Engine Status (Enginestatus) op('voice_agent').par.Enginestatus Str
Default:
"" (Empty String)
Active (Active) op('voice_agent').par.Active Toggle
Default:
False
v1.0.02026-05-02
  • Release update
v0.3.0

# 0.3.0

  • Session refresh coordinator: tracks MAX_SESSION_S per provider, fires auto-refresh at cap and on provider GoAway
  • Per-RESUMPTION routing — native handle replay (Gemini Live, Hume EVI), transcript replay via prime_history (OpenAI Realtime, xAI Grok), or clean expiry (none)
  • Autosessionrefresh bool + Refreshwarning int on Voice page; onStatus(expiring|refreshing|expired) callbacks around refresh lifecycle
  • Conversationcost readout accumulates USD across refreshes; Sessioncost resets per leg; Costbudget enforced against conversation total
  • Sessionstate badge gains refreshing; mic pump gated off during refresh; in-flight user audio dropped (v1)
  • provider_hume_evi RESUMPTION flipped from replay to native to match wire behavior (chat_group_id carries conversation memory server-side)
  • docs/guide.md: Session refresh section replaces "not yet active" known-limitation
v0.2.0

# 0.2.0

First shipped provider: Gemini Live. Built fresh against the locked Group B interface — does not port the gemini_live monolith. Addresses every audit finding from notes/realtime_voice_primitive/001_session_log.md.

  • operator/provider_gemini_live.py:
    • PROVIDER_TYPE='realtime', TRANSPORT='ws', KEY_SERVER='gemini', RESUMPTION='native', SUPPORTS_VIDEO_IN=True, FULL_DUPLEX=False, MAX_SESSION_S=900, SAMPLE_RATE_IN=16000, SAMPLE_RATE_OUT=24000, FRAME_MS=None, DEPENDENCIES=['google-genai'].
    • Model menu: gemini-3.1-flash-live-preview (default), gemini-2.5-flash-native-audio-preview-12-2025, gemini-2.0-flash-live-001.
    • 2026 voice roster (22 voices): Achernar, Algenib, Algieba, Aoede, Charon, Despina, Erinome, Fenrir, Kore, Laomedeia, Leda, Orus, Puck, Rasalgethi, Sadachbia, Sadaltager, Schedar, Sulafat, Umbriel, Vindemiatrix, Zephyr, Zubenelgenubi.
    • pricing(model_id) returns normalized per-minute costs with tier_unverified=True (free-tier reachability is the open question in 001 → Group G).
  • Audit fixes baked in:
    • Mid-session input uses send_realtime_input(audio=…) / (text=…) / (video=…). No session.send(...) wrapper anywhere.
    • Tool responses never send scheduling=NON_BLOCKING; sync-only across 3.x and 2.5 by design (3.x footgun).
    • thinking_config plumbed: thinking_level (3.x) or thinking_budget (2.5), auto-selected per model family.
    • turn_coverage set explicitly on the RealtimeInputConfig (default TURN_INCLUDES_AUDIO_ACTIVITY_AND_ALL_VIDEO).
    • Push-to-talk turn mode sets activity_handling=START_OF_ACTIVITY_INTERRUPTS for consistent barge-in.
    • session_resumption gated by Enablesessionresumption toggle (not always-on).
    • GoAway surfaced via on_status('goaway', …) instead of being swallowed.
    • send_audio_frame asserts bytes + int16-aligned length; no silent float32 → 1007 close.
    • Canonical async for message in session.receive() loop; no manual while self.conversation_active receive walk.
    • Session state (socket, receive task, resumption handle) lives on the GeminiConn object — module stays stateless.
  • Tool schema conversion: OpenAI-style tool declarations from ToolManager.parse_tools are wrapped in types.Tool(function_declarations=[types.FunctionDeclaration(...)]). Google Search grounding is a separate types.Tool(google_search=...) added when Enablegrounding is on.
  • EXT: _collect_session_pars now also passes tools (list of OpenAI-style tool definitions) via the reserved pars['tools'] key. Provider template comment updated to document the reserved key.
  • Cost ballpark: Gemini 3.1 paid tier ~\$0.005/min in + \$0.018/min out audio (~\$18/hr voice chat). Free tier reachability via standard API key unverified.
v0.1.0

# 0.1.0

Initial scaffold of the unified realtime voice-to-voice operator.

  • Submodule created with extends ["util-base-lop", "util-agent-core", "util-speech-template", "util-chained-callbacks"].
  • VoiceRealtimeEXT subclasses SpeechTemplate (SPEECH_TYPE='voice-realtime') and mixes in ChainedCallbacksExt.
    • Manually invokes the template's TTS playback infra (_setup_tts_outputs, _setup_playback_parameters) so speaker-out works without editing util-speech-template.
    • Wires ProviderRegistry(owner, 'voice-realtime'), VoiceRealtimeCallbacks (extends ProviderCallbacks with on_tool_call / on_user_speech_started / on_user_speech_ended), ToolManager from util-agent-core.
  • Base parameters: Provider, Scanproviders, Providersfolder, Initialize, Shutdown, Active, Enginestatus, Micin (CHOP), Inputtext, Sendtext, Pricing. Endpointurl auto-added when the active provider declares KEY_SERVER=None.
  • Async session lifecycle: Initializeprovider.start_session(pars, callbacks) via TDAsyncIO; mic pump drains audio_buffer (filled by inherited ReceiveAudioChunk) into provider.send_audio_frame at the provider's FRAME_MS cadence; provider audio routes to the reused TTS playback chain.
  • Tool calls dispatch through ToolManager (Tool sequence parameters created manually in TD per the agent operator convention) with results routed back via provider.send_tool_response.
  • Callbacks page: Callbackdat, Printcallbacks. Callbacks fired: onSessionStart, onSessionEnd, onAssistantText, onUserText, onToolCall, onAudioIn, onAudioOut, onProviderChange, onError.
  • provider_template.py documents the locked Group B interface: required constants (PROVIDER_TYPE='realtime', SPEECH_TYPE='voice-realtime', TRANSPORT, SAMPLE_RATE_IN/OUT, FRAME_MS, MAX_SESSION_S, RESUMPTION, SUPPORTS_VIDEO_IN, SUPPORTS_VOICE_CLONING, FULL_DUPLEX, KEY_SERVER, DEPENDENCIES); helpers get_parameters, pricing(model_id), voices, is_available; async API start_session, send_audio_frame, send_text, send_tool_response, end_session, optional send_video_frame.
  • No providers ship in this version. First provider (Gemini Live) lands in 0.2.0 (Group D).