Skip to content

Caption

The Caption LOP allows you to generate text descriptions (captions) for images using large language models (LLMs). It takes an image TOP and optional conversation history DAT as input, sends the image and prompt to a configured LLM, and outputs the updated conversation and the generated caption separately.

Caption Operator

  • Input 1 (DAT, optional): Conversation history (table with ‘role’, ‘message’, ‘id’, ‘timestamp’ columns).
  • Input 2 (TOP): Image to be captioned.
  • Output 1 (DAT): Conversation history with the latest user prompt and assistant response appended.
  • Output 2 (DAT): Generated caption text only.
Caption (Caption) op('caption').par.Caption Pulse

Triggers the captioning process.

Default:
None
Streaming (Streaming) op('caption').par.Streaming Toggle

Enable to process the response as a stream.

Default:
off
Options:
off, on
Active (Active) op('caption').par.Active Toggle

Indicates if the operator is currently processing a request. Read-only.

Default:
off
Options:
off, on
Additional Prompt (Prompt) op('caption').par.Prompt Str

Additional instructions or context to guide the captioning model.

Default:
None
Add Prompt As User Message (Adduser) op('caption').par.Adduser Toggle

Includes the 'Additional Prompt' in the conversation history sent to the model and stored in the output conversation.

Default:
off
Options:
off, on
Append Previous Conversation (Appendconversation) op('caption').par.Appendconversation Toggle

Appends the previous conversation from Input 1 to the current request and output conversation.

Default:
off
Options:
off, on
Include Conversation Input (Includeinput) op('caption').par.Includeinput Toggle

Includes the conversation history from Input 1 in the request sent to the model.

Default:
on
Options:
off, on
Add Pretext to Assistant (Addpretext) op('caption').par.Addpretext Toggle

Adds the 'Pretext' parameter content to the beginning of the assistant's response.

Default:
off
Options:
off, on
Pretext (Pretext) op('caption').par.Pretext Str

The predefined text to add to the assistant's response if 'Add Pretext to Assistant' is enabled.

Default:
[based on user image]

Understanding Model Selection

Operators utilizing LLMs (LOPs) offer flexible ways to configure the AI model used:

  • ChatTD Model (Default): By default, LOPs inherit model settings (API Server and Model) from the central ChatTD component. You can configure ChatTD via the "Controls" section in the Operator Create Dialog or its parameter page.
  • Custom Model: Select this option in "Use Model From" to override the ChatTD settings and specify the API Server and AI Model directly within this operator.
  • Controller Model: Choose this to have the LOP inherit its API Server and AI Model parameters from another operator (like a different Agent or any LOP with model parameters) specified in the Controller [ Model ] parameter. This allows centralizing model control.

The Search toggle filters the AI Model dropdown based on keywords entered in Model Search. The Show Model Info toggle (if available) displays detailed information about the selected model directly in the operator's viewer, including cost and token limits.

Output Settings Header
Max Tokens (Maxtokens) op('caption').par.Maxtokens Int

The maximum number of tokens the model should generate.

Default:
256
Temperature (Temperature) op('caption').par.Temperature Float

Controls randomness in the response. Lower values are more deterministic.

Default:
0
Model Selection Header
Use Model From (Modelselection) op('caption').par.Modelselection Menu

Choose where the model configuration comes from.

Default:
chattd_model
Options:
chattd_model, custom_model, controller_model
Controller [ Model ] (Modelcontroller) op('caption').par.Modelcontroller OP

Operator providing model settings when 'Use Model From' is set to controller_model.

Default:
None
Select API Server (Apiserver) op('caption').par.Apiserver StrMenu

Select the LiteLLM provider (API server).

Default:
gemini
Menu Options:
  • openrouter (openrouter)
  • openai (openai)
  • groq (groq)
  • gemini (gemini)
  • ollama (ollama)
  • lmstudio (lmstudio)
  • custom (custom)
AI Model (Model) op('caption').par.Model StrMenu

Specific model to request. Available options depend on the selected provider.

Default:
gemma3:4b
Menu Options:
  • gemma3:4b (gemma3:4b)
Search (Search) op('caption').par.Search Toggle

Enable dynamic model search based on a pattern.

Default:
off
Options:
off, on
Model Search (Modelsearch) op('caption').par.Modelsearch Str

Pattern to filter models when Search is enabled.

Default:
"" (Empty String)
Bypass (Bypass) op('caption').par.Bypass Toggle

If enabled, bypasses the captioning process. Input 2 (TOP) is passed through to Output 2 (DAT will be empty).

Default:
off
Options:
off, on
Show Built-in Parameters (Showbuiltin) op('caption').par.Showbuiltin Toggle

Shows or hides the standard TouchDesigner built-in parameters.

Default:
off
Options:
off, on
Version (Version) op('caption').par.Version Str

The version number of the operator.

Default:
1.0.0
Last Updated (Lastupdated) op('caption').par.Lastupdated Str

The date the operator was last updated.

Default:
2024-11-09
Creator (Creator) op('caption').par.Creator Str

The creator of the operator.

Default:
dotsimulate
Website (Website) op('caption').par.Website Str

The website of the creator.

Default:
https://dotsimulate.com
ChatTD Operator (Chattd) op('caption').par.Chattd OP

Specifies the path to the ChatTD operator used for handling API calls.

Default:
/dot_lops/ChatTD
  • Requires a working TouchDesigner environment.
  • Requires the dot_chat_util library, TDStoreTools, and TDFunctions.
  • Requires the ChatTD operator (specified in the Chattd parameter) to be properly configured with API keys and model access.

The SimpleCaptionEXT provides the following key methods accessible via op('your_caption_op').ext.SimpleCaptionEXT:

  • get_model_selection(): Determines the api_server and model based on the Modelselection parameter. Returns (api_server, model).
  • Caption(): The core method triggered by the Caption pulse parameter. Assembles the request, calls ChatTD.Customapicall, and manages the process.
  • HandleStreamingResponse(response, full_response=None, callbackInfo=None): Callback method used when Streaming is enabled. Processes response chunks.
  • HandleResponse(response, full_response=None, callbackInfo=None): Callback method used when Streaming is disabled. Processes the complete response.
  • ErrorCustomapicall(error_response, full_response=None): Callback method for handling errors during the API call.
  • ResetOp(): Clears internal tables (conversation_dat, history_dat, output_dat), resets Active state, and clears the Prompt parameter.

Refer to the SimpleCaptionEXT code within the component for detailed implementation.