Skip to content

Gemini Image Gen Operator

The Gemini Image Gen LOP allows you to generate images using Google’s Gemini models, specifically leveraging the experimental gemini-2.0-flash-exp-image-generation model accessed through LiteLLM. It takes a text prompt (and optionally an input image or context from a Context Grabber) and generates an image, saving it to a specified directory and logging the process in a history table.

Gemini Image Gen UI

  • Python Packages:
    • litellm: For interacting with the Gemini API.
    • Pillow (PIL): For image processing.
    • opencv-python (cv2): For image conversion.
    • numpy: Required by OpenCV.
    • Install these via the ChatTD Python Manager.
  • ChatTD Operator: Required for dependency management (package installation) and asynchronous task execution. Ensure the ChatTD Operator parameter on the ‘About’ page points to your configured ChatTD instance.
  • Gemini API Key: A valid API key from Google AI Studio is required. Obtain one and enter it into the Gemini API Key parameter.
  • Input DAT (input_prompt, optional): If Prompt Source is set to input_dat, this table is used to construct the prompt. It should contain rows with role (‘user’ or ‘assistant’) and message columns.
  • Input Image TOP (optional): Connected via the Input Image (Optional) parameter. Used for image-to-image tasks (if supported by the model/prompting).
  • Context Grabber COMP (optional): Connected via the Context Grabber (Optional) parameter. Allows adding context (text and images) from another operator to the prompt.
  • Image Files: Generated images are saved as PNG files in the directory specified by Output Directory (or a default location within the ChatTD environment).
  • Metadata Files: JSON files containing details about each generation job (prompt, timestamp, model, paths) are saved alongside the images.
  • History DAT (history_dat): An internal table logging each generation attempt, including job ID, prompt, timestamp, status, model used, and paths to the generated image and metadata files.
  • Viewer TOP (image_viewer): Displays the image selected by the Display Image Index parameter.
Gemini API Key (Apikey) op('gemini_image_gen').par.Apikey Str
Default:
API KEY LOADED
Get API Key (Getapikey) op('gemini_image_gen').par.Getapikey Pulse
Default:
None
Model (Model) op('gemini_image_gen').par.Model Menu
Default:
gemini/gemini-2.0-flash-exp-image-generation
Options:
gemini/gemini-2.0-flash-exp-image-generation
Prompt Source (Promptsource) op('gemini_image_gen').par.Promptsource Menu
Default:
parameter
Options:
parameter, input_dat
Input Image (Optional) (Inputimage) op('gemini_image_gen').par.Inputimage TOP
Default:
None
Context Grabber (Optional) (Contextgrabber) op('gemini_image_gen').par.Contextgrabber COMP
Default:
None
Prompt (Prompt) op('gemini_image_gen').par.Prompt Str
Default:
None
Generate Image (Generate) op('gemini_image_gen').par.Generate Pulse
Default:
None
Output Directory (Outputdir) op('gemini_image_gen').par.Outputdir Folder
Default:
gemini_images_test
Status (Status) op('gemini_image_gen').par.Status Str
Default:
GeminiImageGen
Active (Active) op('gemini_image_gen').par.Active Toggle
Default:
0
Options:
off, on
Display Image Index (Displayimage) op('gemini_image_gen').par.Displayimage Int
Default:
1
Range:
1 to N/A
Slider Range:
1 to N/A
Auto-select Last Image (Setdisplay) op('gemini_image_gen').par.Setdisplay Toggle
Default:
1
Options:
off, on
Generate on Input Change (Onin1) op('gemini_image_gen').par.Onin1 Toggle
Default:
1
Options:
off, on
Bypass (Bypass) op('gemini_image_gen').par.Bypass Toggle
Default:
0
Options:
off, on
Show Built-in Parameters (Showbuiltin) op('gemini_image_gen').par.Showbuiltin Toggle
Default:
0
Options:
off, on
Version (Version) op('gemini_image_gen').par.Version Str
Default:
1.0.0
Last Updated (Lastupdated) op('gemini_image_gen').par.Lastupdated Str
Default:
2025-05-02
Creator (Creator) op('gemini_image_gen').par.Creator Str
Default:
dotsimulate
Website (Website) op('gemini_image_gen').par.Website Str
Default:
https://dotsimulate.com
ChatTD Operator (Chattd) op('gemini_image_gen').par.Chattd OP
Default:
/dot_lops/ChatTD
1. Enter your Gemini API Key in the 'Gemini API Key' parameter.
2. Ensure 'Prompt Source' is set to 'parameter'.
3. Enter your desired prompt in the 'Prompt' parameter (e.g., "A photorealistic cat wearing sunglasses riding a skateboard").
4. Pulse the 'Generate Image' button.
5. Monitor the 'Status' parameter.
6. View the generated image in the operator viewer or the specified 'Output Directory'.
1. Set 'Prompt Source' to 'input_dat'.
2. Create a Table DAT with columns 'role' and 'message'.
3. Add rows with roles 'user' or 'assistant' and your prompt message(s).
4. Connect this DAT to the first input of the GeminiImageGen operator.
5. Ensure 'Generate on Input Change' is enabled if you want automatic generation, otherwise pulse 'Generate Image'.
1. Connect a TOP containing your input image to the 'Input Image (Optional)' parameter.
2. Craft your 'Prompt' to instruct the model on how to use the input image (e.g., "Edit this image to make the sky purple", "Describe this image in detail"). Specific prompt techniques depend on the model's capabilities.
3. Pulse 'Generate Image'.
1. Connect a configured Context Grabber operator to the 'Context Grabber (Optional)' parameter.
2. The text and images collected by the Context Grabber will be automatically included in the prompt sent to Gemini.
3. Enter a main instruction in the 'Prompt' parameter if needed.
4. Pulse 'Generate Image'.
  • API Key: Your Gemini API key is stored securely in a configuration file within your ChatTD environment or retrieved from the ChatTD Key Manager.
  • Dependencies: Requires litellm, Pillow, opencv-python, and numpy. Use ChatTD’s Python Manager to install these.
  • File Saving: Images are saved as PNG files. Metadata is saved as JSON.
  • Asynchronous Operation: Image generation happens asynchronously via ChatTD’s TDAsyncIO, preventing TouchDesigner from freezing.
  • Response Handling: The operator extracts the base64 image data from the API response. If the response contains text alongside the image, this text is stored in the response_text column of the history_dat table.
  • Input Image Encoding: Input TOPs are converted to base64-encoded JPEG data URIs before being sent to the API.
  • ChatTD: Provides core services like dependency management, API key management, and asynchronous task execution required by this operator.
  • Context Grabber: Can be used to provide additional text and image context to the generation prompt.