Gemini Live
The Gemini Live LOP provides real-time, bidirectional voice and video conversation capabilities using Google’s Gemini Live API. This operator enables natural voice interactions with AI models while supporting comprehensive tool integration, allowing the AI to execute TouchDesigner operations, access external services, and interact with the LOPs ecosystem.
Key Features
Section titled “Key Features”- Real-time Voice Conversation: Bidirectional audio streaming with automatic speech recognition and text-to-speech
- Video Input Support: Send video frames to the AI for visual understanding
- Comprehensive Tool Integration: Full support for LOPs tool system including MCP clients, external operators, and custom tools
- Multiple Output Streams: Four distinct outputs for different data types and use cases
- Advanced Turn Management: Support for Auto VAD, Push-to-Talk, and Hybrid modes
- Session Management: Automatic session resumption and conversation history
- Auto-Reconnection: Robust error handling with automatic reconnection capabilities
Requirements
Section titled “Requirements”- Google API Key with Gemini Live API access
google-genai
Python package (can be installed via the operator)- Audio input device for voice interaction
- Optional: Video input via
frame_null
TOP for visual interactions
Installation
Section titled “Installation”- Install Dependencies: Use the “Install/Update google-genai” pulse parameter to install required packages
- API Key Setup: Enter your Google API key in the API Key parameter
- Audio Configuration: Configure your audio input device in the Playback page
- Tool Configuration: Set up external tools via the Tool sequence parameters (optional)
I/O Configuration
Section titled “I/O Configuration”Inputs
Section titled “Inputs”- Audio Input: Connect audio stream to the operator for voice input
- Video Input: Optional
frame_null
TOP inside the component for video frames - Text Input: Optional
text_input
DAT for text-based interactions
Outputs
Section titled “Outputs”The Gemini Live operator provides four distinct output streams, each serving different use cases:
1. Conversation Output (out1)
Section titled “1. Conversation Output (out1)”Primary conversation data table with real-time updates
- Columns:
role
,message
,id
,timestamp
- Content: Complete conversation history with both user and assistant messages
- Update Mode: Configurable via “Conversation Table Update” parameter
Live Transcript
: Updates in real-time as speech is transcribedOn Turn Complete
: Updates only when each conversational turn is finished
- Use Cases:
- Conversation logging and analysis
- Real-time subtitle display
- Conversation state management
- Integration with other LOPs that process conversation data
2. Voice Output (out2)
Section titled “2. Voice Output (out2)”Real-time audio stream from the AI assistant
- Format: 24kHz PCM audio stream
- Content: AI-generated speech responses
- CHOPs:
store_output
: Progressive audio buffer (clears on interruption)full_audio
: Complete session audio accumulator (preserves across interruptions)
- Use Cases:
- Audio playback through TouchDesigner’s audio system
- Audio processing and effects
- Recording and archiving of AI responses
- Integration with audio analysis tools
3. Timer/Playback Position (out3)
Section titled “3. Timer/Playback Position (out3)”Conversation timing and playback control information
- Content: Session timing data, turn durations, and playback position
- Format: CHOP channels with timing information
- Data Types:
- Session duration
- Current turn timing
- Audio playback position
- Turn completion markers
- Use Cases:
- Synchronizing visuals with conversation flow
- Creating conversation timelines
- Playback control interfaces
- Performance monitoring and analytics
4. Logs and Analytics (out4)
Section titled “4. Logs and Analytics (out4)”Comprehensive logging and session analytics
- Content:
- Tool execution logs and results
- Session analytics and metrics
- Error logs and debugging information
- Performance statistics
- Tables:
tool_history
: Complete tool execution historysession_analytics
: Session metrics and statistics- Logger output: Detailed operational logs
- Use Cases:
- Debugging and troubleshooting
- Performance analysis
- Tool usage monitoring
- System integration diagnostics
Tool Integration
Section titled “Tool Integration”This operator exposes 1 tool that allow Agent and Gemini Live LOPs to integrate with the complete LOPs tool ecosystem for real-time AI-driven TouchDesigner operations during live conversations.
Use the Tool Debugger operator to inspect exact tool definitions, schemas, and parameters.
The Gemini Live operator supports the complete LOPs tool ecosystem, allowing AI to execute TouchDesigner operations during real-time conversations:
Tool Configuration
Section titled “Tool Configuration”op('gemini_live').par.Usetools
Toggle - Default:
False
op('gemini_live').par.Tool
Sequence - Default:
0
op('gemini_live').par.Tool0op
OP - Default:
"" (Empty String)
Supported Tool Types
Section titled “Supported Tool Types”- MCP Clients: Access to Model Context Protocol services
- External Operators: Any TouchDesigner operator with GetTool() method
- File Operations: Read/write files, process data
- Network Services: API calls, web scraping, data retrieval
- TouchDesigner Operations: Parameter control, node creation, data processing
- Custom Tools: User-defined operations via Python extensions
Tool Execution Modes
Section titled “Tool Execution Modes”- Blocking: Tool executes synchronously, conversation pauses until completion
- Non-blocking: Tool executes asynchronously, conversation continues during execution
- Hybrid: Automatic mode selection based on tool characteristics
Core Parameters
Section titled “Core Parameters”Gemini Live Page
Section titled “Gemini Live Page”op('gemini_live').par.Systemprompt
String - Default:
"" (Empty String)
op('gemini_live').par.Pushtotalk
Toggle - Default:
False
Tools Page
Section titled “Tools Page”op('gemini_live').par.Allowmodelstop
Toggle - Default:
False
op('gemini_live').par.Outputtext
Toggle - Default:
False
op('gemini_live').par.Enablegrounding
Toggle - Default:
False
Image/Video Page
Section titled “Image/Video Page”op('gemini_live').par.Enableimage
Toggle - Default:
False
op('gemini_live').par.Streaminterval
Float - Default:
0.0
Config Page
Section titled “Config Page”op('gemini_live').par.Enableautoreconnect
Toggle - Default:
False
op('gemini_live').par.Reconnectdelay
Float - Default:
0.0
op('gemini_live').par.Maxreconnectattempts
Int - Default:
0
op('gemini_live').par.Enablevadconfig
Toggle - Default:
False
op('gemini_live').par.Audiosendinterval
Float - Default:
0.0
Usage Examples
Section titled “Usage Examples”Basic Voice Conversation
Section titled “Basic Voice Conversation”# Start a simple voice conversationgemini_live = op('gemini_live')gemini_live.par.Systemprompt = "You are a helpful assistant for TouchDesigner users."gemini_live.par.Voice = "Aoede"gemini_live.par.Turnmode = "auto_vad"gemini_live.par.Startconversation.pulse()
Tool-Enabled Conversation
Section titled “Tool-Enabled Conversation”# Enable tools for AI to interact with TouchDesignergemini_live = op('gemini_live')gemini_live.par.Usetools = Truegemini_live.par.Tool0op = op('mcp_client') # MCP client for external servicesgemini_live.par.Tool0mode = "enabled"gemini_live.par.Systemprompt = "You can help with TouchDesigner and access external tools."gemini_live.par.Startconversation.pulse()
Video-Enhanced Conversation
Section titled “Video-Enhanced Conversation”# Enable video input for visual understandinggemini_live = op('gemini_live')gemini_live.par.Enableimage = Truegemini_live.par.Imagemode = "stream"gemini_live.par.Streaminterval = 2.0 # Send frame every 2 secondsgemini_live.par.Imageresolution = "512x512"gemini_live.par.Systemprompt = "You can see what I'm showing you. Describe what you see."gemini_live.par.Startconversation.pulse()
Push-to-Talk Mode
Section titled “Push-to-Talk Mode”# Configure for noisy environmentsgemini_live = op('gemini_live')gemini_live.par.Turnmode = "push_to_talk"gemini_live.par.Systemprompt = "Respond when I press the talk button."gemini_live.par.Startconversation.pulse()
# Control talking manuallygemini_live.par.Pushtotalk = True # Start talkinggemini_live.par.Pushtotalk = False # Stop talking
Advanced Features
Section titled “Advanced Features”Session Management
Section titled “Session Management”The operator automatically manages conversation sessions with resumption capabilities:
# Session is automatically saved and can be resumedgemini_live = op('gemini_live')gemini_live.par.Enablesessionhistory = Truegemini_live.par.Savesession.pulse() # Manual savegemini_live.par.Listsessions.pulse() # List saved sessions
Audio Processing
Section titled “Audio Processing”Access different audio streams for various use cases:
# Access progressive audio (clears on interruption)store_output = op('gemini_live/store_output')current_audio = store_output['chan1']
# Access full session audio (preserves across interruptions)full_audio = op('gemini_live/full_audio')complete_session = full_audio['chan1']
Tool Result Monitoring
Section titled “Tool Result Monitoring”Monitor tool execution and results:
# Access tool execution historytool_history = op('gemini_live').op('tool_history')for i in range(1, tool_history.numRows): tool_name = tool_history[i, 'tool_name'].val status = tool_history[i, 'status'].val result = tool_history[i, 'result'].val print(f"Tool {tool_name}: {status} - {result}")
Troubleshooting
Section titled “Troubleshooting”Common Issues
Section titled “Common Issues”- No Audio Input: Check audio device configuration in Playback page
- API Key Errors: Verify Google API key has Gemini Live access
- Tool Execution Failures: Check tool operator extensions and GetTool() methods
- Connection Drops: Enable auto-reconnect and check network stability
Performance Optimization
Section titled “Performance Optimization”- Use appropriate audio send intervals (0.1s for responsive, 0.2s for efficiency)
- Configure VAD sensitivity for your environment
- Use non-blocking mode for long-running tools
- Monitor tool execution logs for optimization opportunities
Integration with Other LOPs
Section titled “Integration with Other LOPs”The Gemini Live operator integrates seamlessly with the broader LOPs ecosystem:
- Agent Operator: Share tool configurations and conversation data
- MCP Clients: Access external services and APIs during conversation
- File Operations: Read/write files based on conversation context
- Data Processing: Process conversation data with other TouchDesigner operators
This comprehensive tool integration makes Gemini Live a powerful hub for AI-driven TouchDesigner automation and interaction.