Skip to content

Gemini Live

  • Session history table with readable labels and duration tracking
  • Pause conversation functionality with resume status
  • Smart session resumption for recent conversations
  • Improved nested array handling for function calling schema
  • Enhanced session age parsing and connection error handling
  • Parameter name shortening and better logging transparency

The Gemini Live LOP provides real-time, bidirectional voice and video conversation capabilities using Google’s Gemini Live API. This operator enables natural voice interactions with AI models while supporting comprehensive tool integration, allowing the AI to execute TouchDesigner operations, access external services, and interact with the LOPs ecosystem.

  • Real-time Voice Conversation: Bidirectional audio streaming with automatic speech recognition and text-to-speech
  • Video Input Support: Send video frames to the AI for visual understanding
  • Comprehensive Tool Integration: Full support for LOPs tool system including MCP clients, external operators, and custom tools
  • Multiple Output Streams: Four distinct outputs for different data types and use cases
  • Advanced Turn Management: Support for Auto VAD, Push-to-Talk, and Hybrid modes
  • Session Management: Automatic session resumption and conversation history
  • Auto-Reconnection: Robust error handling with automatic reconnection capabilities
  • Google API Key with Gemini Live API access
  • google-genai Python package (can be installed via the operator)
  • Audio input device for voice interaction
  • Optional: Video input via frame_null TOP for visual interactions
  1. Install Dependencies: Use the “Install/Update google-genai” pulse parameter to install required packages
  2. API Key Setup: Enter your Google API key in the API Key parameter
  3. Audio Configuration: Configure your audio input device in the Playback page
  4. Tool Configuration: Set up external tools via the Tool sequence parameters (optional)
  • Audio Input: Connect audio stream to the operator for voice input
  • Video Input: Optional frame_null TOP inside the component for video frames
  • Text Input: Optional text_input DAT for text-based interactions

The Gemini Live operator provides four distinct output streams, each serving different use cases:

Primary conversation data table with real-time updates

  • Columns: role, message, id, timestamp
  • Content: Complete conversation history with both user and assistant messages
  • Update Mode: Configurable via “Conversation Table Update” parameter
    • Live Transcript: Updates in real-time as speech is transcribed
    • On Turn Complete: Updates only when each conversational turn is finished
  • Use Cases:
    • Conversation logging and analysis
    • Real-time subtitle display
    • Conversation state management
    • Integration with other LOPs that process conversation data

Real-time audio stream from the AI assistant

  • Format: 24kHz PCM audio stream
  • Content: AI-generated speech responses
  • CHOPs:
    • store_output: Progressive audio buffer (clears on interruption)
    • full_audio: Complete session audio accumulator (preserves across interruptions)
  • Use Cases:
    • Audio playback through TouchDesigner’s audio system
    • Audio processing and effects
    • Recording and archiving of AI responses
    • Integration with audio analysis tools

Conversation timing and playback control information

  • Content: Session timing data, turn durations, and playback position
  • Format: CHOP channels with timing information
  • Data Types:
    • Session duration
    • Current turn timing
    • Audio playback position
    • Turn completion markers
  • Use Cases:
    • Synchronizing visuals with conversation flow
    • Creating conversation timelines
    • Playback control interfaces
    • Performance monitoring and analytics

Comprehensive logging and session analytics

  • Content:
    • Tool execution logs and results
    • Session analytics and metrics
    • Error logs and debugging information
    • Performance statistics
  • Tables:
    • tool_history: Complete tool execution history
    • session_analytics: Session metrics and statistics
    • Logger output: Detailed operational logs
  • Use Cases:
    • Debugging and troubleshooting
    • Performance analysis
    • Tool usage monitoring
    • System integration diagnostics
Status (Status) op('gemini_live').par.Status Str
Default:
None
Live Connection (Statusconnected) op('gemini_live').par.Statusconnected Toggle
Default:
None
Conversation Status (Statusconversationactive) op('gemini_live').par.Statusconversationactive Toggle
Default:
None
Start (Start) op('gemini_live').par.Start Pulse
Default:
None
Pause (Pause) op('gemini_live').par.Pause Pulse
Default:
None
Stop (Stop) op('gemini_live').par.Stop Pulse
Default:
None
Model (Model) op('gemini_live').par.Model Menu
Default:
gemini-2.0-flash-live-001
Voice (Voice) op('gemini_live').par.Voice Menu
Default:
Puck
System Prompt (Systemprompt) op('gemini_live').par.Systemprompt Str
Default:
None
Turn Management Mode (Turnmode) op('gemini_live').par.Turnmode Menu
Default:
auto_vad
Push to Talk (Pushtotalk) op('gemini_live').par.Pushtotalk Toggle
Default:
None
Image Send Mode (Imagemode) op('gemini_live').par.Imagemode Menu
Default:
pulse
Send Image (Sendimage) op('gemini_live').par.Sendimage Pulse
Default:
None
TOP (Top) op('gemini_live').par.Top TOP
Default:
None
Send Text (Sendtext) op('gemini_live').par.Sendtext Pulse
Default:
None
DAT (Dat) op('gemini_live').par.Dat DAT
Default:
None
Conversation Table Update (Conversationupdate) op('gemini_live').par.Conversationupdate Menu
Default:
live
Gemini Live Tools Header
Allow Model to Stop Conversation (Allowmodelstop) op('gemini_live').par.Allowmodelstop Toggle
Default:
None
Output Text (out4) (Outputtext) op('gemini_live').par.Outputtext Toggle
Default:
None
Enable Google Search (built in) (Enablegrounding) op('gemini_live').par.Enablegrounding Toggle
Default:
None
Use LOP Tools Header
Use LOP Tools (Usetools) op('gemini_live').par.Usetools Toggle
Default:
None
External Op Tools (Tool) op('gemini_live').par.Tool Sequence
Default:
None
Mode (Tool0mode) op('gemini_live').par.Tool0mode Menu
Default:
enabled
OP (Tool0op) op('gemini_live').par.Tool0op OP
Default:
None
Enable Image Input (Enableimage) op('gemini_live').par.Enableimage Toggle
Default:
None
Resolution [ fit outside ] (Imageresolution) op('gemini_live').par.Imageresolution Menu
Default:
use_top
Stream Interval (sec) (Streaminterval) op('gemini_live').par.Streaminterval Float
Default:
1.0
Range:
0.1 to 10
Custom Width (Customwidth) op('gemini_live').par.Customwidth Int
Default:
512
Range:
64 to 2048
Custom Height (Customheight) op('gemini_live').par.Customheight Int
Default:
512
Range:
64 to 2048
Media Resolution (Mediaresolution) op('gemini_live').par.Mediaresolution Menu
Default:
default
Audio Device Settings Header
Active (Audioactive) op('gemini_live').par.Audioactive Toggle
Default:
True
Driver (Driver) op('gemini_live').par.Driver Menu
Default:
default
Device (Device) op('gemini_live').par.Device Menu
Default:
default
Volume (Volume) op('gemini_live').par.Volume Float
Default:
1.0
Range:
0 to 1
Clear Audio Buffers (Clearaudio) op('gemini_live').par.Clearaudio Pulse
Default:
None
Enable Session History (Enablesessionhistory) op('gemini_live').par.Enablesessionhistory Toggle
Default:
None
Save Session (Savesession) op('gemini_live').par.Savesession Pulse
Default:
None
Load Session (Loadsession) op('gemini_live').par.Loadsession Pulse
Default:
None
List Sessions (Listsessions) op('gemini_live').par.Listsessions Pulse
Default:
None
Session to Load (Sessiontoload) op('gemini_live').par.Sessiontoload Menu
Default:
None
List All Sessions (Listallsessions) op('gemini_live').par.Listallsessions Toggle
Default:
None
API Key (Apikey) op('gemini_live').par.Apikey Str
Default:
None
Install/Update google-genai (Installgooglegenai) op('gemini_live').par.Installgooglegenai Pulse
Default:
None
Enable User Transcription (Enableusertranscription) op('gemini_live').par.Enableusertranscription Toggle
Default:
None
Enable Session Resumption (Enablesessionresumption) op('gemini_live').par.Enablesessionresumption Toggle
Default:
None
Enable Context Compression (Enablecontextcompression) op('gemini_live').par.Enablecontextcompression Toggle
Default:
None
Audio Send Interval (sec) (Audiosendinterval) op('gemini_live').par.Audiosendinterval Float
Default:
0.1
Range:
0.05 to 0.5
Configure VAD (Enablevadconfig) op('gemini_live').par.Enablevadconfig Toggle
Default:
None
VAD Start Sensitivity (Startofspeechsensitivity) op('gemini_live').par.Startofspeechsensitivity Menu
Default:
low
VAD End Sensitivity (Endofspeechsensitivity) op('gemini_live').par.Endofspeechsensitivity Menu
Default:
low
VAD Prefix Padding (ms) (Prefixpaddingms) op('gemini_live').par.Prefixpaddingms Int
Default:
50
Range:
0 to 500
VAD Silence Duration (ms) (Silencedurationms) op('gemini_live').par.Silencedurationms Int
Default:
1000
Range:
100 to 5000
Language Code (BCP-47) (Languagecode) op('gemini_live').par.Languagecode Str
Default:
None
Enable Auto-Reconnect (Enableautoreconnect) op('gemini_live').par.Enableautoreconnect Toggle
Default:
None
Reconnect Delay (sec) (Reconnectdelay) op('gemini_live').par.Reconnectdelay Float
Default:
3.0
Range:
1 to 30
Max Reconnect Attempts (Maxreconnectattempts) op('gemini_live').par.Maxreconnectattempts Int
Default:
3
Range:
1 to 10
Current Reconnect Attempts (Reconnectattempts) op('gemini_live').par.Reconnectattempts Int
Default:
None
ChatTD Operator (Chattd) op('gemini_live').par.Chattd OP
Default:
None
Bypass (Bypass) op('gemini_live').par.Bypass Toggle
Default:
None
Show Built-in Parameters (Showbuiltin) op('gemini_live').par.Showbuiltin Toggle
Default:
None
Version (Version) op('gemini_live').par.Version Str
Default:
None
Last Updated (Lastupdated) op('gemini_live').par.Lastupdated Str
Default:
None
Creator (Creator) op('gemini_live').par.Creator Str
Default:
None
Website (Website) op('gemini_live').par.Website Str
Default:
None
  1. Set the System Prompt parameter to define the AI’s personality.
  2. Select a Voice from the dropdown menu.
  3. Set the Turn Management Mode to Auto VAD.
  4. Pulse the Start parameter.
  5. Speak into your microphone. The conversation will be displayed in the conversation_dat table.
  1. Enable Use LOP Tools on the Tools page.
  2. Connect a tool operator (e.g., a tool_dat with a Python script) to the External Op Tools parameter.
  3. Start a conversation and ask the AI to perform a task that requires the tool.
  1. Enable Enable Image Input on the Image page.
  2. Connect a TOP operator to the TOP parameter on the Gemini Live page.
  3. Set the Image Send Mode to Stream and adjust the Stream Interval.
  4. Start a conversation and ask the AI about what it sees.
  1. Set the Turn Management Mode to push_to_talk.
  2. Start the conversation.
  3. Use the Push to Talk toggle to control when your audio is sent to the AI.

The operator automatically manages conversation sessions with resumption capabilities:

  1. Enable Enable Session History on the History page.
  2. Pulse Save Session to manually save the current conversation.
  3. Use List Sessions and Load Session to resume a previous conversation.

Access different audio streams for various use cases:

  • store_output CHOP: Progressive audio buffer (clears on interruption).
  • full_audio CHOP: Complete session audio accumulator (preserves across interruptions).

Monitor tool execution and results in the tool_history table inside the operator.

  1. No Audio Input: Check audio device configuration in Playback page
  2. API Key Errors: Verify Google API key has Gemini Live access
  3. Tool Execution Failures: Check tool operator extensions and GetTool() methods
  4. Connection Drops: Enable auto-reconnect and check network stability
  • Use appropriate audio send intervals (0.1s for responsive, 0.2s for efficiency)
  • Configure VAD sensitivity for your environment
  • Use non-blocking mode for long-running tools
  • Monitor tool execution logs for optimization opportunities

The Gemini Live operator integrates seamlessly with the broader LOPs ecosystem:

  • Agent Operator: Share tool configurations and conversation data
  • MCP Clients: Access external services and APIs during conversation
  • File Operations: Read/write files based on conversation context
  • Data Processing: Process conversation data with other TouchDesigner operators

This comprehensive tool integration makes Gemini Live a powerful hub for AI-driven TouchDesigner automation and interaction.