Gemini Image Gen Operator

Overview

The Gemini Image Gen LOP allows you to generate images using Google’s Gemini models, specifically leveraging the experimental gemini-2.0-flash-exp-image-generation model accessed through LiteLLM. It takes a text prompt (and optionally an input image or context from a Context Grabber) and generates an image, saving it to a specified directory and logging the process in a history table.

Gemini Image Gen UI

Requirements

Python Packages:
- litellm: For interacting with the Gemini API.
- Pillow (PIL): For image processing.
- opencv-python (cv2): For image conversion.
- numpy: Required by OpenCV.
- Install these via the ChatTD Python Manager.
ChatTD Operator: Required for dependency management (package installation) and asynchronous task execution. Ensure the ChatTD Operator parameter on the ‘About’ page points to your configured ChatTD instance.
Gemini API Key: A valid API key from Google AI Studio is required. Obtain one and enter it into the Gemini API Key parameter.

Input/Output

Inputs

Input DAT (input_prompt, optional): If Prompt Source is set to input_dat, this table is used to construct the prompt. It should contain rows with role (‘user’ or ‘assistant’) and message columns.
Input Image TOP (optional): Connected via the Input Image (Optional) parameter. Used for image-to-image tasks (if supported by the model/prompting).
Context Grabber COMP (optional): Connected via the Context Grabber (Optional) parameter. Allows adding context (text and images) from another operator to the prompt.

Outputs

Image Files: Generated images are saved as PNG files in the directory specified by Output Directory (or a default location within the ChatTD environment).
Metadata Files: JSON files containing details about each generation job (prompt, timestamp, model, paths) are saved alongside the images.
History DAT (history_dat): An internal table logging each generation attempt, including job ID, prompt, timestamp, status, model used, and paths to the generated image and metadata files.
Viewer TOP (image_viewer): Displays the image selected by the Display Image Index parameter.

Parameters

Page: Gemini

Gemini API Key (Apikey) op('gemini_image_gen').par.Apikey Str

Default:: API KEY LOADED

Get API Key (Getapikey) op('gemini_image_gen').par.Getapikey Pulse

Default:: None

Input Image (Optional) (Inputimage) op('gemini_image_gen').par.Inputimage TOP

Default:: None

Context Grabber (Optional) (Contextgrabber) op('gemini_image_gen').par.Contextgrabber COMP

Default:: None

Prompt (Prompt) op('gemini_image_gen').par.Prompt Str

Default:: None

Generate Image (Generate) op('gemini_image_gen').par.Generate Pulse

Default:: None

Output Directory (Outputdir) op('gemini_image_gen').par.Outputdir Folder

Default:: gemini_images_test

Status (Status) op('gemini_image_gen').par.Status Str

Default:: GeminiImageGen

Active (Active) op('gemini_image_gen').par.Active Toggle

Default:: 0
Options:: off, on

Display Image Index (Displayimage) op('gemini_image_gen').par.Displayimage Int

Default:: 1
Range:: 1 to N/A
Slider Range:: 1 to N/A

Auto-select Last Image (Setdisplay) op('gemini_image_gen').par.Setdisplay Toggle

Default:: 1
Options:: off, on

Generate on Input Change (Onin1) op('gemini_image_gen').par.Onin1 Toggle

Default:: 1
Options:: off, on

Page: About

Bypass (Bypass) op('gemini_image_gen').par.Bypass Toggle

Default:: 0
Options:: off, on

Show Built-in Parameters (Showbuiltin) op('gemini_image_gen').par.Showbuiltin Toggle

Default:: 0
Options:: off, on

Version (Version) op('gemini_image_gen').par.Version Str

Default:: 1.0.0

Last Updated (Lastupdated) op('gemini_image_gen').par.Lastupdated Str

Default:: 2025-05-02

Creator (Creator) op('gemini_image_gen').par.Creator Str

Default:: dotsimulate

Website (Website) op('gemini_image_gen').par.Website Str

Default:: https://dotsimulate.com

ChatTD Operator (Chattd) op('gemini_image_gen').par.Chattd OP

Default:: /dot_lops/ChatTD

Usage Examples

Basic Image Generation

1. Enter your Gemini API Key in the 'Gemini API Key' parameter.
2. Ensure 'Prompt Source' is set to 'parameter'.
3. Enter your desired prompt in the 'Prompt' parameter (e.g., "A photorealistic cat wearing sunglasses riding a skateboard").
4. Pulse the 'Generate Image' button.
5. Monitor the 'Status' parameter.
6. View the generated image in the operator viewer or the specified 'Output Directory'.

Generating from Input DAT

1. Set 'Prompt Source' to 'input_dat'.
2. Create a Table DAT with columns 'role' and 'message'.
3. Add rows with roles 'user' or 'assistant' and your prompt message(s).
4. Connect this DAT to the first input of the GeminiImageGen operator.
5. Ensure 'Generate on Input Change' is enabled if you want automatic generation, otherwise pulse 'Generate Image'.

Using an Input Image

1. Connect a TOP containing your input image to the 'Input Image (Optional)' parameter.
2. Craft your 'Prompt' to instruct the model on how to use the input image (e.g., "Edit this image to make the sky purple", "Describe this image in detail"). Specific prompt techniques depend on the model's capabilities.
3. Pulse 'Generate Image'.

Using a Context Grabber

1. Connect a configured Context Grabber operator to the 'Context Grabber (Optional)' parameter.
2. The text and images collected by the Context Grabber will be automatically included in the prompt sent to Gemini.
3. Enter a main instruction in the 'Prompt' parameter if needed.
4. Pulse 'Generate Image'.

Technical Notes

API Key: Your Gemini API key is stored securely in a configuration file within your ChatTD environment or retrieved from the ChatTD Key Manager.
Dependencies: Requires litellm, Pillow, opencv-python, and numpy. Use ChatTD’s Python Manager to install these.
File Saving: Images are saved as PNG files. Metadata is saved as JSON.
Asynchronous Operation: Image generation happens asynchronously via ChatTD’s TDAsyncIO, preventing TouchDesigner from freezing.
Response Handling: The operator extracts the base64 image data from the API response. If the response contains text alongside the image, this text is stored in the response_text column of the history_dat table.
Input Image Encoding: Input TOPs are converted to base64-encoded JPEG data URIs before being sent to the API.

ChatTD: Provides core services like dependency management, API key management, and asynchronous task execution required by this operator.
Context Grabber: Can be used to provide additional text and image context to the generation prompt.