Skip to content

Gemini Live

The Gemini Live LOP provides real-time, bidirectional voice and video conversation capabilities using Google’s Gemini Live API. This operator enables natural voice interactions with AI models while supporting comprehensive tool integration, allowing the AI to execute TouchDesigner operations, access external services, and interact with the LOPs ecosystem.

  • Real-time Voice Conversation: Bidirectional audio streaming with automatic speech recognition and text-to-speech
  • Video Input Support: Send video frames to the AI for visual understanding
  • Comprehensive Tool Integration: Full support for LOPs tool system including MCP clients, external operators, and custom tools
  • Multiple Output Streams: Four distinct outputs for different data types and use cases
  • Advanced Turn Management: Support for Auto VAD, Push-to-Talk, and Hybrid modes
  • Session Management: Automatic session resumption and conversation history
  • Auto-Reconnection: Robust error handling with automatic reconnection capabilities
  • Google API Key with Gemini Live API access
  • google-genai Python package (can be installed via the operator)
  • Audio input device for voice interaction
  • Optional: Video input via frame_null TOP for visual interactions
  1. Install Dependencies: Use the “Install/Update google-genai” pulse parameter to install required packages
  2. API Key Setup: Enter your Google API key in the API Key parameter
  3. Audio Configuration: Configure your audio input device in the Playback page
  4. Tool Configuration: Set up external tools via the Tool sequence parameters (optional)
  • Audio Input: Connect audio stream to the operator for voice input
  • Video Input: Optional frame_null TOP inside the component for video frames
  • Text Input: Optional text_input DAT for text-based interactions

The Gemini Live operator provides four distinct output streams, each serving different use cases:

Primary conversation data table with real-time updates

  • Columns: role, message, id, timestamp
  • Content: Complete conversation history with both user and assistant messages
  • Update Mode: Configurable via “Conversation Table Update” parameter
    • Live Transcript: Updates in real-time as speech is transcribed
    • On Turn Complete: Updates only when each conversational turn is finished
  • Use Cases:
    • Conversation logging and analysis
    • Real-time subtitle display
    • Conversation state management
    • Integration with other LOPs that process conversation data

Real-time audio stream from the AI assistant

  • Format: 24kHz PCM audio stream
  • Content: AI-generated speech responses
  • CHOPs:
    • store_output: Progressive audio buffer (clears on interruption)
    • full_audio: Complete session audio accumulator (preserves across interruptions)
  • Use Cases:
    • Audio playback through TouchDesigner’s audio system
    • Audio processing and effects
    • Recording and archiving of AI responses
    • Integration with audio analysis tools

Conversation timing and playback control information

  • Content: Session timing data, turn durations, and playback position
  • Format: CHOP channels with timing information
  • Data Types:
    • Session duration
    • Current turn timing
    • Audio playback position
    • Turn completion markers
  • Use Cases:
    • Synchronizing visuals with conversation flow
    • Creating conversation timelines
    • Playback control interfaces
    • Performance monitoring and analytics

Comprehensive logging and session analytics

  • Content:
    • Tool execution logs and results
    • Session analytics and metrics
    • Error logs and debugging information
    • Performance statistics
  • Tables:
    • tool_history: Complete tool execution history
    • session_analytics: Session metrics and statistics
    • Logger output: Detailed operational logs
  • Use Cases:
    • Debugging and troubleshooting
    • Performance analysis
    • Tool usage monitoring
    • System integration diagnostics
🔧 GetTool Enabled 1 tool

This operator exposes 1 tool that allow Agent and Gemini Live LOPs to integrate with the complete LOPs tool ecosystem for real-time AI-driven TouchDesigner operations during live conversations.

The Gemini Live operator supports the complete LOPs tool ecosystem, allowing AI to execute TouchDesigner operations during real-time conversations:

Use LOP Tools (Usetools) op('gemini_live').par.Usetools Toggle
Default:
False
External Op Tools (Tool) op('gemini_live').par.Tool Sequence
Default:
0
Mode (Tool0mode) op('gemini_live').par.Tool0mode Menu
Default:
enabled
OP (Tool0op) op('gemini_live').par.Tool0op OP
Default:
"" (Empty String)
  • MCP Clients: Access to Model Context Protocol services
  • External Operators: Any TouchDesigner operator with GetTool() method
  • File Operations: Read/write files, process data
  • Network Services: API calls, web scraping, data retrieval
  • TouchDesigner Operations: Parameter control, node creation, data processing
  • Custom Tools: User-defined operations via Python extensions
  • Blocking: Tool executes synchronously, conversation pauses until completion
  • Non-blocking: Tool executes asynchronously, conversation continues during execution
  • Hybrid: Automatic mode selection based on tool characteristics
Model (Model) op('gemini_live').par.Model Menu
Default:
gemini-2.0-flash-live-001
Voice (Voice) op('gemini_live').par.Voice Menu
Default:
Aoede
System Prompt (Systemprompt) op('gemini_live').par.Systemprompt String
Default:
"" (Empty String)
Turn Management Mode (Turnmode) op('gemini_live').par.Turnmode Menu
Default:
auto_vad
Push to Talk (Pushtotalk) op('gemini_live').par.Pushtotalk Toggle
Default:
False
Conversation Table Update (Conversationupdate) op('gemini_live').par.Conversationupdate Menu
Default:
live
Allow Model to Stop Conversation (Allowmodelstop) op('gemini_live').par.Allowmodelstop Toggle
Default:
False
Output Text (out4) (Outputtext) op('gemini_live').par.Outputtext Toggle
Default:
False
Enable Google Search (built in) (Enablegrounding) op('gemini_live').par.Enablegrounding Toggle
Default:
False
Enable Image Input (Enableimage) op('gemini_live').par.Enableimage Toggle
Default:
False
Image Send Mode (Imagemode) op('gemini_live').par.Imagemode Menu
Default:
pulse
Stream Interval (sec) (Streaminterval) op('gemini_live').par.Streaminterval Float
Default:
0.0
Resolution (Imageresolution) op('gemini_live').par.Imageresolution Menu
Default:
use_top
Enable Auto-Reconnect (Enableautoreconnect) op('gemini_live').par.Enableautoreconnect Toggle
Default:
False
Reconnect Delay (sec) (Reconnectdelay) op('gemini_live').par.Reconnectdelay Float
Default:
0.0
Max Reconnect Attempts (Maxreconnectattempts) op('gemini_live').par.Maxreconnectattempts Int
Default:
0
Configure VAD (Enablevadconfig) op('gemini_live').par.Enablevadconfig Toggle
Default:
False
Audio Send Interval (sec) (Audiosendinterval) op('gemini_live').par.Audiosendinterval Float
Default:
0.0
# Start a simple voice conversation
gemini_live = op('gemini_live')
gemini_live.par.Systemprompt = "You are a helpful assistant for TouchDesigner users."
gemini_live.par.Voice = "Aoede"
gemini_live.par.Turnmode = "auto_vad"
gemini_live.par.Startconversation.pulse()
# Enable tools for AI to interact with TouchDesigner
gemini_live = op('gemini_live')
gemini_live.par.Usetools = True
gemini_live.par.Tool0op = op('mcp_client') # MCP client for external services
gemini_live.par.Tool0mode = "enabled"
gemini_live.par.Systemprompt = "You can help with TouchDesigner and access external tools."
gemini_live.par.Startconversation.pulse()
# Enable video input for visual understanding
gemini_live = op('gemini_live')
gemini_live.par.Enableimage = True
gemini_live.par.Imagemode = "stream"
gemini_live.par.Streaminterval = 2.0 # Send frame every 2 seconds
gemini_live.par.Imageresolution = "512x512"
gemini_live.par.Systemprompt = "You can see what I'm showing you. Describe what you see."
gemini_live.par.Startconversation.pulse()
# Configure for noisy environments
gemini_live = op('gemini_live')
gemini_live.par.Turnmode = "push_to_talk"
gemini_live.par.Systemprompt = "Respond when I press the talk button."
gemini_live.par.Startconversation.pulse()
# Control talking manually
gemini_live.par.Pushtotalk = True # Start talking
gemini_live.par.Pushtotalk = False # Stop talking

The operator automatically manages conversation sessions with resumption capabilities:

# Session is automatically saved and can be resumed
gemini_live = op('gemini_live')
gemini_live.par.Enablesessionhistory = True
gemini_live.par.Savesession.pulse() # Manual save
gemini_live.par.Listsessions.pulse() # List saved sessions

Access different audio streams for various use cases:

# Access progressive audio (clears on interruption)
store_output = op('gemini_live/store_output')
current_audio = store_output['chan1']
# Access full session audio (preserves across interruptions)
full_audio = op('gemini_live/full_audio')
complete_session = full_audio['chan1']

Monitor tool execution and results:

# Access tool execution history
tool_history = op('gemini_live').op('tool_history')
for i in range(1, tool_history.numRows):
tool_name = tool_history[i, 'tool_name'].val
status = tool_history[i, 'status'].val
result = tool_history[i, 'result'].val
print(f"Tool {tool_name}: {status} - {result}")
  1. No Audio Input: Check audio device configuration in Playback page
  2. API Key Errors: Verify Google API key has Gemini Live access
  3. Tool Execution Failures: Check tool operator extensions and GetTool() methods
  4. Connection Drops: Enable auto-reconnect and check network stability
  • Use appropriate audio send intervals (0.1s for responsive, 0.2s for efficiency)
  • Configure VAD sensitivity for your environment
  • Use non-blocking mode for long-running tools
  • Monitor tool execution logs for optimization opportunities

The Gemini Live operator integrates seamlessly with the broader LOPs ecosystem:

  • Agent Operator: Share tool configurations and conversation data
  • MCP Clients: Access external services and APIs during conversation
  • File Operations: Read/write files based on conversation context
  • Data Processing: Process conversation data with other TouchDesigner operators

This comprehensive tool integration makes Gemini Live a powerful hub for AI-driven TouchDesigner automation and interaction.