Gemini Live

v2.3.1 What's new

The Gemini Live LOP provides real-time, bidirectional voice and video conversation capabilities using Google’s Gemini Live API. This operator enables natural voice interactions with AI models while supporting comprehensive tool integration, allowing the AI to execute TouchDesigner operations, access external services, and interact with the LOPs ecosystem.

Key Features

Real-time Voice Conversation: Bidirectional audio streaming with automatic speech recognition and text-to-speech
Video Input Support: Send video frames to the AI for visual understanding
Comprehensive Tool Integration: Full support for LOPs tool system including MCP clients, external operators, and custom tools
Multiple Output Streams: Four distinct outputs for different data types and use cases
Advanced Turn Management: Support for Auto VAD, Push-to-Talk, and Hybrid modes
Session Management: Automatic session resumption and conversation history
Auto-Reconnection: Robust error handling with automatic reconnection capabilities

Requirements

Google API Key with Gemini Live API access
google-genai Python package (can be installed via the operator)
Audio input device for voice interaction
Optional: Video input via frame_null TOP for visual interactions

Installation

Install Dependencies: Use the “Install/Update google-genai” pulse parameter to install required packages
API Key Setup: Enter your Google API key in the API Key parameter
Audio Configuration: Configure your audio input device in the Playback page
Tool Configuration: Set up external tools via the Tool sequence parameters (optional)

I/O Configuration

Inputs

Audio Input: Connect audio stream to the operator for voice input
Video Input: Optional frame_null TOP inside the component for video frames
Text Input: Optional text_input DAT for text-based interactions

Outputs

The Gemini Live operator provides four distinct output streams, each serving different use cases:

1. Conversation Output (out1)

Primary conversation data table with real-time updates

Columns: role, message, id, timestamp
Content: Complete conversation history with both user and assistant messages
Update Mode: Configurable via “Conversation Table Update” parameter
- Live Transcript: Updates in real-time as speech is transcribed
- On Turn Complete: Updates only when each conversational turn is finished
Use Cases:
- Conversation logging and analysis
- Real-time subtitle display
- Conversation state management
- Integration with other LOPs that process conversation data

2. Voice Output (out2)

Real-time audio stream from the AI assistant

Format: 24kHz PCM audio stream
Content: AI-generated speech responses
CHOPs:
- store_output: Progressive audio buffer (clears on interruption)
- full_audio: Complete session audio accumulator (preserves across interruptions)
Use Cases:
- Audio playback through TouchDesigner’s audio system
- Audio processing and effects
- Recording and archiving of AI responses
- Integration with audio analysis tools

3. Timer/Playback Position (out3)

Conversation timing and playback control information

Content: Session timing data, turn durations, and playback position
Format: CHOP channels with timing information
Data Types:
- Session duration
- Current turn timing
- Audio playback position
- Turn completion markers
Use Cases:
- Synchronizing visuals with conversation flow
- Creating conversation timelines
- Playback control interfaces
- Performance monitoring and analytics

4. Logs and Analytics (out4)

Comprehensive logging and session analytics

Content:
- Tool execution logs and results
- Session analytics and metrics
- Error logs and debugging information
- Performance statistics
Tables:
- tool_history: Complete tool execution history
- session_analytics: Session metrics and statistics
- Logger output: Detailed operational logs
Use Cases:
- Debugging and troubleshooting
- Performance analysis
- Tool usage monitoring
- System integration diagnostics

Parameters

Page: Gemini Live

Status (Status) op('gemini_live').par.Status Str

Default:: None

Live Connection (Statusconnected) op('gemini_live').par.Statusconnected Toggle

Default:: None

Conversation Status (Statusconversationactive) op('gemini_live').par.Statusconversationactive Toggle

Default:: None

Start (Start) op('gemini_live').par.Start Pulse

Default:: None

Pause (Pause) op('gemini_live').par.Pause Pulse

Default:: None

Stop (Stop) op('gemini_live').par.Stop Pulse

Default:: None

System Prompt (Systemprompt) op('gemini_live').par.Systemprompt Str

Default:: None

Push to Talk (Pushtotalk) op('gemini_live').par.Pushtotalk Toggle

Default:: None

Send Image (Sendimage) op('gemini_live').par.Sendimage Pulse

Default:: None

TOP (Top) op('gemini_live').par.Top TOP

Default:: None

Send Text (Sendtext) op('gemini_live').par.Sendtext Pulse

Default:: None

DAT (Dat) op('gemini_live').par.Dat DAT

Default:: None

Page: Tools

Gemini Live Tools Header

Allow Model to Stop Conversation (Allowmodelstop) op('gemini_live').par.Allowmodelstop Toggle

Default:: None

Output Text (out4) (Outputtext) op('gemini_live').par.Outputtext Toggle

Default:: None

Enable Google Search (built in) (Enablegrounding) op('gemini_live').par.Enablegrounding Toggle

Default:: None

Use LOP Tools Header

Use LOP Tools (Usetools) op('gemini_live').par.Usetools Toggle

Default:: None

External Op Tools (Tool) op('gemini_live').par.Tool Sequence

Default:: None

OP (Tool0op) op('gemini_live').par.Tool0op OP

Default:: None

Page: Image

Enable Image Input (Enableimage) op('gemini_live').par.Enableimage Toggle

Default:: None

Stream Interval (sec) (Streaminterval) op('gemini_live').par.Streaminterval Float

Default:: 1.0
Range:: 0.1 to 10

Custom Width (Customwidth) op('gemini_live').par.Customwidth Int

Default:: 512
Range:: 64 to 2048

Custom Height (Customheight) op('gemini_live').par.Customheight Int

Default:: 512
Range:: 64 to 2048

Page: Playback

Audio Device Settings Header

Active (Audioactive) op('gemini_live').par.Audioactive Toggle

Default:: True

Volume (Volume) op('gemini_live').par.Volume Float

Default:: 1.0
Range:: 0 to 1

Clear Audio Buffers (Clearaudio) op('gemini_live').par.Clearaudio Pulse

Default:: None

Page: History

Enable Session History (Enablesessionhistory) op('gemini_live').par.Enablesessionhistory Toggle

Default:: None

Save Session (Savesession) op('gemini_live').par.Savesession Pulse

Default:: None

Load Session (Loadsession) op('gemini_live').par.Loadsession Pulse

Default:: None

List Sessions (Listsessions) op('gemini_live').par.Listsessions Pulse

Default:: None

List All Sessions (Listallsessions) op('gemini_live').par.Listallsessions Toggle

Default:: None

Page: Config

API Key (Apikey) op('gemini_live').par.Apikey Str

Default:: None

Install/Update google-genai (Installgooglegenai) op('gemini_live').par.Installgooglegenai Pulse

Default:: None

Enable User Transcription (Enableusertranscription) op('gemini_live').par.Enableusertranscription Toggle

Default:: None

Enable Session Resumption (Enablesessionresumption) op('gemini_live').par.Enablesessionresumption Toggle

Default:: None

Enable Context Compression (Enablecontextcompression) op('gemini_live').par.Enablecontextcompression Toggle

Default:: None

Audio Send Interval (sec) (Audiosendinterval) op('gemini_live').par.Audiosendinterval Float

Default:: 0.1
Range:: 0.05 to 0.5

Configure VAD (Enablevadconfig) op('gemini_live').par.Enablevadconfig Toggle

Default:: None

VAD Prefix Padding (ms) (Prefixpaddingms) op('gemini_live').par.Prefixpaddingms Int

Default:: 50
Range:: 0 to 500

VAD Silence Duration (ms) (Silencedurationms) op('gemini_live').par.Silencedurationms Int

Default:: 1000
Range:: 100 to 5000

Language Code (BCP-47) (Languagecode) op('gemini_live').par.Languagecode Str

Default:: None

Enable Auto-Reconnect (Enableautoreconnect) op('gemini_live').par.Enableautoreconnect Toggle

Default:: None

Reconnect Delay (sec) (Reconnectdelay) op('gemini_live').par.Reconnectdelay Float

Default:: 3.0
Range:: 1 to 30

Max Reconnect Attempts (Maxreconnectattempts) op('gemini_live').par.Maxreconnectattempts Int

Default:: 3
Range:: 1 to 10

Current Reconnect Attempts (Reconnectattempts) op('gemini_live').par.Reconnectattempts Int

Default:: None

Page: About

ChatTD Operator (Chattd) op('gemini_live').par.Chattd OP

Default:: None

Bypass (Bypass) op('gemini_live').par.Bypass Toggle

Default:: None

Show Built-in Parameters (Showbuiltin) op('gemini_live').par.Showbuiltin Toggle

Default:: None

Version (Version) op('gemini_live').par.Version Str

Default:: None

Last Updated (Lastupdated) op('gemini_live').par.Lastupdated Str

Default:: None

Creator (Creator) op('gemini_live').par.Creator Str

Default:: None

Website (Website) op('gemini_live').par.Website Str

Default:: None

Usage Examples

Basic Voice Conversation

Set the System Prompt parameter to define the AI’s personality.
Select a Voice from the dropdown menu.
Set the Turn Management Mode to Auto VAD.
Pulse the Start parameter.
Speak into your microphone. The conversation will be displayed in the conversation_dat table.

Using Tools

Enable Use LOP Tools on the Tools page.
Connect a tool operator (e.g., a tool_dat with a Python script) to the External Op Tools parameter.
Start a conversation and ask the AI to perform a task that requires the tool.

Video-Enhanced Conversation

Enable Enable Image Input on the Image page.
Connect a TOP operator to the TOP parameter on the Gemini Live page.
Set the Image Send Mode to Stream and adjust the Stream Interval.
Start a conversation and ask the AI about what it sees.

Push-to-Talk Mode

Set the Turn Management Mode to push_to_talk.
Start the conversation.
Use the Push to Talk toggle to control when your audio is sent to the AI.

Advanced Features

Session Management

The operator automatically manages conversation sessions with resumption capabilities:

Enable Enable Session History on the History page.
Pulse Save Session to manually save the current conversation.
Use List Sessions and Load Session to resume a previous conversation.

Audio Processing

Access different audio streams for various use cases:

store_output CHOP: Progressive audio buffer (clears on interruption).
full_audio CHOP: Complete session audio accumulator (preserves across interruptions).

Tool Result Monitoring

Monitor tool execution and results in the tool_history table inside the operator.

Troubleshooting

Common Issues

No Audio Input: Check audio device configuration in Playback page
API Key Errors: Verify Google API key has Gemini Live access
Tool Execution Failures: Check tool operator extensions and GetTool() methods
Connection Drops: Enable auto-reconnect and check network stability

Performance Optimization

Use appropriate audio send intervals (0.1s for responsive, 0.2s for efficiency)
Configure VAD sensitivity for your environment
Use non-blocking mode for long-running tools
Monitor tool execution logs for optimization opportunities

Integration with Other LOPs

The Gemini Live operator integrates seamlessly with the broader LOPs ecosystem:

Agent Operator: Share tool configurations and conversation data
MCP Clients: Access external services and APIs during conversation
File Operations: Read/write files based on conversation context
Data Processing: Process conversation data with other TouchDesigner operators

This comprehensive tool integration makes Gemini Live a powerful hub for AI-driven TouchDesigner automation and interaction.