OCR Operator

Overview

The OCR LOP extracts text from images using Optical Character Recognition. It leverages the SideCar server to run either the EasyOCR or PaddleOCR library for the actual text detection and recognition. This allows for offline OCR processing (after initial model setup in SideCar) and offloads the computational work from the main TouchDesigner process.

OCR UI

Requirements

SideCar Server: The SideCar server application must be running. See the SideCar Guide for setup instructions.
SideCar Dependencies: The Python environment used by the SideCar server needs the relevant OCR library installed:
- For EasyOCR: easyocr
- For PaddleOCR: paddleocr, paddlepaddle-gpu (or paddlepaddle for CPU)
ChatTD Operator: Required for asynchronous communication with the SideCar server and logging. Ensure the ChatTD Operator parameter on the ‘About’ page points to your configured ChatTD instance.

Input/Output

Inputs

Input TOP (Inputtop parameter): Connect the image (TOP) from which you want to extract text.

Outputs

Results Table (results_table DAT): An internal table containing detailed results for each detected text block, including the text, confidence score, and bounding box information (u, v, width, height).
Output Text (Outputtext parameter): A string parameter displaying the combined extracted text (if Combine Text is On) or the text from the first detected block.
Viewer TOP / Output 2 (out2_top): Displays the input image with bounding boxes drawn around the detected text regions. The appearance (color, line width) can be customized.

Parameters

Page: OCR

Process (Process) op('ocr').par.Process Pulse

Default:: None

Input TOP (Inputtop) op('ocr').par.Inputtop TOP

Default:: None

Status (Status) op('ocr').par.Status String

Default:: None

Active (Active) op('ocr').par.Active Toggle

Default:: Off

Min Confidence (Minconfidence) op('ocr').par.Minconfidence Float

Default:: 0.4
Range:: 0 to 1
Slider Range:: 0 to 1

Image Scale (Imagescale) op('ocr').par.Imagescale Float

Default:: 1
Range:: 0.1 to 1
Slider Range:: 0.1 to 1

Combine Text (Combinetext) op('ocr').par.Combinetext Toggle

Default:: On

Output Text (Outputtext) op('ocr').par.Outputtext String

Default:: None

Display / TOP Out2 Header

Line Width (Width) op('ocr').par.Width Float

Default:: 2
Range:: 0.1 to N/A
Slider Range:: 1 to 10

Color R (Colorr) op('ocr').par.Colorr RGB

Default:: 0.98

Color G (Colorg) op('ocr').par.Colorg RGB

Default:: 0.52

Color B (Colorb) op('ocr').par.Colorb RGB

Default:: 0.02

Page: Callbacks

Callbacks Header

Callback DAT (Callbackdat) op('ocr').par.Callbackdat DAT

Default:: None

Edit Callbacks (Editcallbacksscript) op('ocr').par.Editcallbacksscript Pulse

Default:: None

Create Callbacks (Createpulse) op('ocr').par.Createpulse Pulse

Default:: None

onComplete (Oncomplete) op('ocr').par.Oncomplete Toggle

Default:: On

Available Callbacks:

onComplete

Example Callback Structure:

def onComplete(info):
# Called after OCR processing completes and results are received.
# info dictionary contains details like:
# - op: The OCR operator instance
# - text: List of detected text strings
# - results: List of detailed result dictionaries (text, confidence, u, v, width, height)
# - processing_time: Time taken by the SideCar server in seconds
# - status: 'success' or 'error'
# - error: Error message if status is 'error'

if info.get('status') == 'success':
  num_results = len(info.get('results', []))
  combined = " ".join(info.get('text', []))
  print(f"OCR finished successfully: Found {num_results} text blocks.")
  # print(f"Combined text: {combined[:100]}...") # Example: Print first 100 chars
else:
  print(f"OCR Error: {info.get('error')}")

# Example: Trigger another operator based on results
# if num_results > 0:
#   op('downstream_logic').par.Process.pulse()
pass

Page: About

Bypass (Bypass) op('ocr').par.Bypass Toggle

Default:: Off

Show Built-in Parameters (Showbuiltin) op('ocr').par.Showbuiltin Toggle

Default:: Off

Version (Version) op('ocr').par.Version String

Default:: 1.0.0

Last Updated (Lastupdated) op('ocr').par.Lastupdated String

Default:: 2025-01-27

Creator (Creator) op('ocr').par.Creator String

Default:: dotsimulate

Website (Website) op('ocr').par.Website String

Default:: https://dotsimulate.com

ChatTD Operator (Chattd) op('ocr').par.Chattd OP

Default:: /dot_lops/ChatTD

Usage Examples

Basic OCR

1. Ensure the SideCar server is running and has the desired OCR library (`easyocr` or `paddleocr`) installed.
2. Connect an image TOP containing text to the `Input TOP` parameter.
3. Select the desired engine using the `Model Type` parameter (e.g., 'EasyOCR').
4. Adjust `Min Confidence` if needed (e.g., 0.3 to capture more uncertain text).
5. Pulse the `Process` parameter.
6. Monitor the `Status` parameter. Extracted text will appear in the `Output Text` parameter and detailed results in the internal `results_table` DAT.
7. The second TOP output will show the input image with detected text areas highlighted.

Using Callbacks

1. Perform basic OCR setup as above.
2. Create a Text DAT (e.g., 'ocr_callbacks').
3. Paste the example `onComplete` function code into the DAT.
4. Set the `Callback DAT` parameter on the OCR operator to point to 'ocr_callbacks'.
5. Ensure the `onComplete` toggle is On.
6. Pulse `Process`.
7. When OCR finishes, the `onComplete` function in your DAT will execute, printing information to the Textport.

Technical Notes

SideCar Dependency: This operator relies entirely on the SideCar server for OCR processing. Ensure the SideCar is running and the selected Model Type library is installed in its environment.
Asynchronous Operation: Image data is sent to the SideCar, processed, and results are returned asynchronously via ChatTD, preventing TouchDesigner from freezing during processing.
Performance: Processing time depends on image size/complexity, the chosen Model Type, the Image Scale, and the performance of the machine running the SideCar. PaddleOCR often requires more resources (especially GPU) than EasyOCR.
Visualization: The second TOP output provides a visual representation of the detected text regions using bounding boxes.

SideCar: The backend service required for this operator to function.
ChatTD: Provides core services like asynchronous task execution and logging.
Florence-2 Operator: Can also perform OCR, potentially with different characteristics, using the Florence-2 model via SideCar.