Caption
Overview
Section titled “Overview”The Caption LOP allows you to generate text descriptions (captions) for images using large language models (LLMs). It takes an image TOP and optional conversation history DAT as input, sends the image and prompt to a configured LLM, and outputs the updated conversation and the generated caption separately.
Inputs/Outputs
Section titled “Inputs/Outputs”- Input 1 (DAT, optional): Conversation history (table with ‘role’, ‘message’, ‘id’, ‘timestamp’ columns).
- Input 2 (TOP): Image to be captioned.
- Output 1 (DAT): Conversation history with the latest user prompt and assistant response appended.
- Output 2 (DAT): Generated caption text only.
Parameters
Section titled “Parameters”Page: Caption
Section titled “Page: Caption”op('caption').par.Caption
Pulse Triggers the captioning process.
- Default:
None
op('caption').par.Streaming
Toggle Enable to process the response as a stream.
- Default:
off
- Options:
- off, on
op('caption').par.Active
Toggle Indicates if the operator is currently processing a request. Read-only.
- Default:
off
- Options:
- off, on
op('caption').par.Prompt
Str Additional instructions or context to guide the captioning model.
- Default:
None
op('caption').par.Adduser
Toggle Includes the 'Additional Prompt' in the conversation history sent to the model and stored in the output conversation.
- Default:
off
- Options:
- off, on
op('caption').par.Appendconversation
Toggle Appends the previous conversation from Input 1 to the current request and output conversation.
- Default:
off
- Options:
- off, on
op('caption').par.Includeinput
Toggle Includes the conversation history from Input 1 in the request sent to the model.
- Default:
on
- Options:
- off, on
op('caption').par.Addpretext
Toggle Adds the 'Pretext' parameter content to the beginning of the assistant's response.
- Default:
off
- Options:
- off, on
op('caption').par.Pretext
Str The predefined text to add to the assistant's response if 'Add Pretext to Assistant' is enabled.
- Default:
[based on user image]
Page: Model
Section titled “Page: Model”Understanding Model Selection
Operators utilizing LLMs (LOPs) offer flexible ways to configure the AI model used:
- ChatTD Model (Default): By default, LOPs inherit model settings (API Server and Model) from the central
ChatTD
component. You can configureChatTD
via the "Controls" section in the Operator Create Dialog or its parameter page. - Custom Model: Select this option in "Use Model From" to override the
ChatTD
settings and specify theAPI Server
andAI Model
directly within this operator. - Controller Model: Choose this to have the LOP inherit its
API Server
andAI Model
parameters from another operator (like a different Agent or any LOP with model parameters) specified in theController [ Model ]
parameter. This allows centralizing model control.
The Search toggle filters the AI Model
dropdown based on keywords entered in Model Search
. The Show Model Info toggle (if available) displays detailed information about the selected model directly in the operator's viewer, including cost and token limits.
Available LLM Models + Providers Resources
The following links point to API key pages or documentation for the supported providers. For a complete and up-to-date list, see the LiteLLM provider docs.
op('caption').par.Maxtokens
Int The maximum number of tokens the model should generate.
- Default:
256
op('caption').par.Temperature
Float Controls randomness in the response. Lower values are more deterministic.
- Default:
0
op('caption').par.Modelcontroller
OP Operator providing model settings when 'Use Model From' is set to controller_model.
- Default:
None
op('caption').par.Search
Toggle Enable dynamic model search based on a pattern.
- Default:
off
- Options:
- off, on
op('caption').par.Modelsearch
Str Pattern to filter models when Search is enabled.
- Default:
"" (Empty String)
Page: About
Section titled “Page: About”op('caption').par.Bypass
Toggle If enabled, bypasses the captioning process. Input 2 (TOP) is passed through to Output 2 (DAT will be empty).
- Default:
off
- Options:
- off, on
op('caption').par.Showbuiltin
Toggle Shows or hides the standard TouchDesigner built-in parameters.
- Default:
off
- Options:
- off, on
op('caption').par.Version
Str The version number of the operator.
- Default:
1.0.0
op('caption').par.Lastupdated
Str The date the operator was last updated.
- Default:
2024-11-09
op('caption').par.Creator
Str The creator of the operator.
- Default:
dotsimulate
op('caption').par.Website
Str The website of the creator.
- Default:
https://dotsimulate.com
op('caption').par.Chattd
OP Specifies the path to the ChatTD operator used for handling API calls.
- Default:
/dot_lops/ChatTD
Requirements
Section titled “Requirements”- Requires a working TouchDesigner environment.
- Requires the
dot_chat_util
library,TDStoreTools
, andTDFunctions
. - Requires the ChatTD operator (specified in the
Chattd
parameter) to be properly configured with API keys and model access.
API & Extension Methods
Section titled “API & Extension Methods”The SimpleCaptionEXT
provides the following key methods accessible via op('your_caption_op').ext.SimpleCaptionEXT
:
get_model_selection()
: Determines theapi_server
andmodel
based on theModelselection
parameter. Returns(api_server, model)
.Caption()
: The core method triggered by theCaption
pulse parameter. Assembles the request, callsChatTD.Customapicall
, and manages the process.HandleStreamingResponse(response, full_response=None, callbackInfo=None)
: Callback method used whenStreaming
is enabled. Processes response chunks.HandleResponse(response, full_response=None, callbackInfo=None)
: Callback method used whenStreaming
is disabled. Processes the complete response.ErrorCustomapicall(error_response, full_response=None)
: Callback method for handling errors during the API call.ResetOp()
: Clears internal tables (conversation_dat
,history_dat
,output_dat
), resetsActive
state, and clears thePrompt
parameter.
Refer to the SimpleCaptionEXT
code within the component for detailed implementation.