Gemini Live
Gemini Live v2.3.1 [ September 2, 2025 ]
- Session history table with readable labels and duration tracking
- Pause conversation functionality with resume status
- Smart session resumption for recent conversations
- Improved nested array handling for function calling schema
- Enhanced session age parsing and connection error handling
- Parameter name shortening and better logging transparency
The Gemini Live LOP provides real-time, bidirectional voice and video conversation capabilities using Google’s Gemini Live API. This operator enables natural voice interactions with AI models while supporting comprehensive tool integration, allowing the AI to execute TouchDesigner operations, access external services, and interact with the LOPs ecosystem.
Key Features
Section titled “Key Features”- Real-time Voice Conversation: Bidirectional audio streaming with automatic speech recognition and text-to-speech
- Video Input Support: Send video frames to the AI for visual understanding
- Comprehensive Tool Integration: Full support for LOPs tool system including MCP clients, external operators, and custom tools
- Multiple Output Streams: Four distinct outputs for different data types and use cases
- Advanced Turn Management: Support for Auto VAD, Push-to-Talk, and Hybrid modes
- Session Management: Automatic session resumption and conversation history
- Auto-Reconnection: Robust error handling with automatic reconnection capabilities
Requirements
Section titled “Requirements”- Google API Key with Gemini Live API access
google-genai
Python package (can be installed via the operator)- Audio input device for voice interaction
- Optional: Video input via
frame_null
TOP for visual interactions
Installation
Section titled “Installation”- Install Dependencies: Use the “Install/Update google-genai” pulse parameter to install required packages
- API Key Setup: Enter your Google API key in the API Key parameter
- Audio Configuration: Configure your audio input device in the Playback page
- Tool Configuration: Set up external tools via the Tool sequence parameters (optional)
I/O Configuration
Section titled “I/O Configuration”Inputs
Section titled “Inputs”- Audio Input: Connect audio stream to the operator for voice input
- Video Input: Optional
frame_null
TOP inside the component for video frames - Text Input: Optional
text_input
DAT for text-based interactions
Outputs
Section titled “Outputs”The Gemini Live operator provides four distinct output streams, each serving different use cases:
1. Conversation Output (out1)
Section titled “1. Conversation Output (out1)”Primary conversation data table with real-time updates
- Columns:
role
,message
,id
,timestamp
- Content: Complete conversation history with both user and assistant messages
- Update Mode: Configurable via “Conversation Table Update” parameter
Live Transcript
: Updates in real-time as speech is transcribedOn Turn Complete
: Updates only when each conversational turn is finished
- Use Cases:
- Conversation logging and analysis
- Real-time subtitle display
- Conversation state management
- Integration with other LOPs that process conversation data
2. Voice Output (out2)
Section titled “2. Voice Output (out2)”Real-time audio stream from the AI assistant
- Format: 24kHz PCM audio stream
- Content: AI-generated speech responses
- CHOPs:
store_output
: Progressive audio buffer (clears on interruption)full_audio
: Complete session audio accumulator (preserves across interruptions)
- Use Cases:
- Audio playback through TouchDesigner’s audio system
- Audio processing and effects
- Recording and archiving of AI responses
- Integration with audio analysis tools
3. Timer/Playback Position (out3)
Section titled “3. Timer/Playback Position (out3)”Conversation timing and playback control information
- Content: Session timing data, turn durations, and playback position
- Format: CHOP channels with timing information
- Data Types:
- Session duration
- Current turn timing
- Audio playback position
- Turn completion markers
- Use Cases:
- Synchronizing visuals with conversation flow
- Creating conversation timelines
- Playback control interfaces
- Performance monitoring and analytics
4. Logs and Analytics (out4)
Section titled “4. Logs and Analytics (out4)”Comprehensive logging and session analytics
- Content:
- Tool execution logs and results
- Session analytics and metrics
- Error logs and debugging information
- Performance statistics
- Tables:
tool_history
: Complete tool execution historysession_analytics
: Session metrics and statistics- Logger output: Detailed operational logs
- Use Cases:
- Debugging and troubleshooting
- Performance analysis
- Tool usage monitoring
- System integration diagnostics
Parameters
Section titled “Parameters”Page: Gemini Live
Section titled “Page: Gemini Live”op('gemini_live').par.Status
Str - Default:
None
op('gemini_live').par.Statusconnected
Toggle - Default:
None
op('gemini_live').par.Statusconversationactive
Toggle - Default:
None
op('gemini_live').par.Start
Pulse - Default:
None
op('gemini_live').par.Pause
Pulse - Default:
None
op('gemini_live').par.Stop
Pulse - Default:
None
op('gemini_live').par.Systemprompt
Str - Default:
None
op('gemini_live').par.Pushtotalk
Toggle - Default:
None
op('gemini_live').par.Sendimage
Pulse - Default:
None
op('gemini_live').par.Top
TOP - Default:
None
op('gemini_live').par.Sendtext
Pulse - Default:
None
op('gemini_live').par.Dat
DAT - Default:
None
Page: Tools
Section titled “Page: Tools”op('gemini_live').par.Allowmodelstop
Toggle - Default:
None
op('gemini_live').par.Outputtext
Toggle - Default:
None
op('gemini_live').par.Enablegrounding
Toggle - Default:
None
op('gemini_live').par.Usetools
Toggle - Default:
None
op('gemini_live').par.Tool
Sequence - Default:
None
op('gemini_live').par.Tool0op
OP - Default:
None
Page: Image
Section titled “Page: Image”op('gemini_live').par.Enableimage
Toggle - Default:
None
op('gemini_live').par.Streaminterval
Float - Default:
1.0
- Range:
- 0.1 to 10
op('gemini_live').par.Customwidth
Int - Default:
512
- Range:
- 64 to 2048
op('gemini_live').par.Customheight
Int - Default:
512
- Range:
- 64 to 2048
Page: Playback
Section titled “Page: Playback”op('gemini_live').par.Audioactive
Toggle - Default:
True
op('gemini_live').par.Volume
Float - Default:
1.0
- Range:
- 0 to 1
op('gemini_live').par.Clearaudio
Pulse - Default:
None
Page: History
Section titled “Page: History”op('gemini_live').par.Enablesessionhistory
Toggle - Default:
None
op('gemini_live').par.Savesession
Pulse - Default:
None
op('gemini_live').par.Loadsession
Pulse - Default:
None
op('gemini_live').par.Listsessions
Pulse - Default:
None
op('gemini_live').par.Listallsessions
Toggle - Default:
None
Page: Config
Section titled “Page: Config”op('gemini_live').par.Apikey
Str - Default:
None
op('gemini_live').par.Installgooglegenai
Pulse - Default:
None
op('gemini_live').par.Enableusertranscription
Toggle - Default:
None
op('gemini_live').par.Enablesessionresumption
Toggle - Default:
None
op('gemini_live').par.Enablecontextcompression
Toggle - Default:
None
op('gemini_live').par.Audiosendinterval
Float - Default:
0.1
- Range:
- 0.05 to 0.5
op('gemini_live').par.Enablevadconfig
Toggle - Default:
None
op('gemini_live').par.Prefixpaddingms
Int - Default:
50
- Range:
- 0 to 500
op('gemini_live').par.Silencedurationms
Int - Default:
1000
- Range:
- 100 to 5000
op('gemini_live').par.Languagecode
Str - Default:
None
op('gemini_live').par.Enableautoreconnect
Toggle - Default:
None
op('gemini_live').par.Reconnectdelay
Float - Default:
3.0
- Range:
- 1 to 30
op('gemini_live').par.Maxreconnectattempts
Int - Default:
3
- Range:
- 1 to 10
op('gemini_live').par.Reconnectattempts
Int - Default:
None
Page: About
Section titled “Page: About”op('gemini_live').par.Chattd
OP - Default:
None
op('gemini_live').par.Bypass
Toggle - Default:
None
op('gemini_live').par.Showbuiltin
Toggle - Default:
None
op('gemini_live').par.Version
Str - Default:
None
op('gemini_live').par.Lastupdated
Str - Default:
None
op('gemini_live').par.Creator
Str - Default:
None
op('gemini_live').par.Website
Str - Default:
None
Usage Examples
Section titled “Usage Examples”Basic Voice Conversation
Section titled “Basic Voice Conversation”- Set the
System Prompt
parameter to define the AI’s personality. - Select a
Voice
from the dropdown menu. - Set the
Turn Management Mode
toAuto VAD
. - Pulse the
Start
parameter. - Speak into your microphone. The conversation will be displayed in the
conversation_dat
table.
Using Tools
Section titled “Using Tools”- Enable
Use LOP Tools
on theTools
page. - Connect a tool operator (e.g., a
tool_dat
with a Python script) to theExternal Op Tools
parameter. - Start a conversation and ask the AI to perform a task that requires the tool.
Video-Enhanced Conversation
Section titled “Video-Enhanced Conversation”- Enable
Enable Image Input
on theImage
page. - Connect a
TOP
operator to theTOP
parameter on theGemini Live
page. - Set the
Image Send Mode
toStream
and adjust theStream Interval
. - Start a conversation and ask the AI about what it sees.
Push-to-Talk Mode
Section titled “Push-to-Talk Mode”- Set the
Turn Management Mode
topush_to_talk
. - Start the conversation.
- Use the
Push to Talk
toggle to control when your audio is sent to the AI.
Advanced Features
Section titled “Advanced Features”Session Management
Section titled “Session Management”The operator automatically manages conversation sessions with resumption capabilities:
- Enable
Enable Session History
on theHistory
page. - Pulse
Save Session
to manually save the current conversation. - Use
List Sessions
andLoad Session
to resume a previous conversation.
Audio Processing
Section titled “Audio Processing”Access different audio streams for various use cases:
store_output
CHOP: Progressive audio buffer (clears on interruption).full_audio
CHOP: Complete session audio accumulator (preserves across interruptions).
Tool Result Monitoring
Section titled “Tool Result Monitoring”Monitor tool execution and results in the tool_history
table inside the operator.
Troubleshooting
Section titled “Troubleshooting”Common Issues
Section titled “Common Issues”- No Audio Input: Check audio device configuration in Playback page
- API Key Errors: Verify Google API key has Gemini Live access
- Tool Execution Failures: Check tool operator extensions and GetTool() methods
- Connection Drops: Enable auto-reconnect and check network stability
Performance Optimization
Section titled “Performance Optimization”- Use appropriate audio send intervals (0.1s for responsive, 0.2s for efficiency)
- Configure VAD sensitivity for your environment
- Use non-blocking mode for long-running tools
- Monitor tool execution logs for optimization opportunities
Integration with Other LOPs
Section titled “Integration with Other LOPs”The Gemini Live operator integrates seamlessly with the broader LOPs ecosystem:
- Agent Operator: Share tool configurations and conversation data
- MCP Clients: Access external services and APIs during conversation
- File Operations: Read/write files based on conversation context
- Data Processing: Process conversation data with other TouchDesigner operators
This comprehensive tool integration makes Gemini Live a powerful hub for AI-driven TouchDesigner automation and interaction.