Florence-2 Operator
Overview
Section titled “Overview”The Florence-2 LOP provides an interface to Microsoft’s powerful Florence-2 vision foundation model. It enables a wide range of vision tasks, including detailed image captioning, object detection (region proposal), OCR (Optical Character Recognition), referring expression segmentation, and more. This operator requires the separate SideCar server to be running, as the actual model loading and inference computation happen within the SideCar process, utilizing its resources (potentially including a dedicated GPU).
Requirements
Section titled “Requirements”- SideCar Server: The SideCar server application must be running. See the SideCar Guide for setup instructions.
- SideCar Dependencies: The Python environment used by the SideCar server needs the following packages installed:
torch>=2.1.1
(CUDA version recommended)transformers
timm
einops
- ChatTD Operator: Required for asynchronous communication with the SideCar server and logging. Ensure the
ChatTD Operator
parameter on the ‘About’ page points to your configured ChatTD instance.
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”- Input TOP (
in_top
): Connect the image (TOP) you want to process here.
Outputs
Section titled “Outputs”- Output Text DAT (
output_dat
): Contains the primary text result from the selected Florence-2 task (e.g., the generated caption, the OCR text). - Conversation DAT (
conversation_dat
): Stores the latest interaction, typically including the input prompt (if used) and the assistant’s (Florence-2’s) response. - History DAT (
history_dat
): Appends a log entry for each successful processing task, storing the role, message, model used, and timestamp.
Parameters
Section titled “Parameters”Page: Florence2
Section titled “Page: Florence2” Load Model (Load)
op('florence').par.Load
Pulse - Default:
None
Process Image (Process)
op('florence').par.Process
Pulse - Default:
None
Reset (Reset)
op('florence').par.Reset
Pulse - Default:
None
Active (Active)
op('florence').par.Active
Toggle - Default:
Off
Status (Status)
op('florence').par.Status
String - Default:
None
Input Prompt (Prompt)
op('florence').par.Prompt
String - Default:
None
Max Tokens (Maxtokens)
op('florence').par.Maxtokens
Int - Default:
512
- Range:
- 1 to N/A
- Slider Range:
- 64 to 4096
Num Beams (Numbeams)
op('florence').par.Numbeams
Int - Default:
3
- Range:
- 1 to N/A
- Slider Range:
- 1 to 10
Do Sample (Dosample)
op('florence').par.Dosample
Toggle - Default:
On
Random Seed (Seed)
op('florence').par.Seed
Int - Default:
42
- Range:
- -1 to N/A
- Slider Range:
- 0 to 10000
Fill Region Masks (Fillmask)
op('florence').par.Fillmask
Toggle - Default:
On
Mask Selection (Maskselect)
op('florence').par.Maskselect
Str - Default:
None
Page: About
Section titled “Page: About” Bypass (Bypass)
op('florence').par.Bypass
Toggle - Default:
Off
Show Built-in Parameters (Showbuiltin)
op('florence').par.Showbuiltin
Toggle - Default:
Off
Version (Version)
op('florence').par.Version
String - Default:
1.0.0
Last Updated (Lastupdated)
op('florence').par.Lastupdated
String - Default:
2024-11-09
Creator (Creator)
op('florence').par.Creator
String - Default:
dotsimulate
Website (Website)
op('florence').par.Website
String - Default:
https://dotsimulate.com
ChatTD Operator (Chattd)
op('florence').par.Chattd
OP - Default:
/dot_lops/ChatTD
Usage Examples
Section titled “Usage Examples”Image Captioning
Section titled “Image Captioning”1. Ensure the SideCar server is running.2. Connect an image TOP to the Florence-2 input.3. Select a desired model (e.g., 'microsoft/Florence-2-large') from the `Florence Model` menu.4. Pulse the `Load Model` parameter and wait for the status to indicate readiness (may take time on first load).5. Set the `Task` parameter to 'more_detailed_caption'.6. Pulse the `Process Image` parameter.7. Monitor the `Status` parameter. The generated caption will appear in the `output_dat` DAT.
Optical Character Recognition (OCR)
Section titled “Optical Character Recognition (OCR)”1. Ensure SideCar is running and the desired model is loaded (pulse `Load Model`).2. Connect an image TOP containing text to the input.3. Set the `Task` parameter to 'ocr'.4. Pulse `Process Image`.5. The extracted text will appear in the `output_dat` DAT.
Object Detection (Region Proposal)
Section titled “Object Detection (Region Proposal)”1. Ensure SideCar is running and the model is loaded.2. Connect an image TOP.3. Set the `Task` parameter to 'region_proposal'.4. Pulse `Process Image`.5. The results (bounding boxes and labels) will appear in the `output_dat` DAT (often as structured text or JSON). Visualizations may appear in the node viewer depending on internal settings.
Technical Notes
Section titled “Technical Notes”- SideCar Dependency: This operator is critically dependent on the SideCar server. All model loading and inference occur in the SideCar process.
- Resource Intensive: Florence-2 models, especially the larger variants, require significant computational resources, primarily GPU VRAM. Ensure the machine running SideCar meets the requirements for the selected model.
- Asynchronous Operation: Communication with the SideCar server (loading models, processing images) is handled asynchronously via ChatTD’s TDAsyncIO, preventing TouchDesigner from freezing.
- Task-Specific Prompts: Some tasks like
docvqa
orreferring_expression_segmentation
require an appropriateInput Prompt
to function correctly. - Precision & Attention:
Precision
andAttention Mechanism
parameters affect performance and resource usage on the SideCar server.fp16
/bf16
andflash_attention_2
(if installed and supported) can offer significant speedups.
Related Operators
Section titled “Related Operators”- SideCar: The backend service required for this operator to function.
- ChatTD: Provides core services like asynchronous task execution and logging.
- OCR Operator: Another operator focused specifically on OCR, potentially using different backends (like EasyOCR or PaddleOCR via SideCar).