Skip to content

BM25 Search

v0.1.0New

The BM25 LOP performs relevance-based keyword search over TouchDesigner tables and RAG indices using the industry-standard Okapi BM25 ranking algorithm. It complements vector-based retrieval by excelling at exact term matching and technical content search.

🔧 GetTool Enabled 1 tool

This operator exposes 1 tool that allow Agent and Gemini Live LOPs to search indexed documents using BM25 keyword matching with configurable top-k and score thresholds.

When connected to an Agent LOP, the agent can call the bm25_search tool to find specific terms, technical content, and exact matches across your indexed documents. Enable ‘Allow Agent Control’ on the Agent page to let agents adjust BM25 parameters like k1 and b during search.

The operator supports three input modes, configured via ‘Input Mode’ on the Search page:

  • Source Table - Connect a DAT table output from any source operator (Source Docs, Source Crawl4ai, etc.)
  • RAG Index - Point to a RAG Index operator to search over its indexed chunks
  • Direct Table - Reference any table DAT directly
  • Results Table - RAG-indexer compatible results with query, content, BM25 score, source path, metadata, and chunk context
  • Stats Table - Search statistics including result count, average score, and timing
  • Doc Index Table - Debug view of all indexed documents with chunk counts
  1. On the Search page, set ‘Input Mode’ to your preferred source type.
  2. If using ‘Source Table’, set the ‘Source Table DAT’ to your source operator’s output table.
  3. Enter a search query in the ‘Search Query’ field.
  4. Set ‘Top K Results’ to limit how many results are returned.
  5. Pulse ‘Execute Search’ to run the search.
  6. Results appear in the results_table DAT output.
  1. Set ‘Input Mode’ to ‘RAG Index’.
  2. Set ‘RAG Index Source’ to your RAG Index operator.
  3. The BM25 operator will index the chunks from the RAG index, giving you keyword search alongside vector search.
  4. Enable ‘Auto Index’ to automatically rebuild the index when the source data changes.
  1. On the BM25 page, adjust ‘K1 Parameter’ (default 1.5) to control term frequency saturation. Higher values give more weight to term frequency.
  2. Adjust ‘B Parameter’ (default 0.75) to control document length normalization. Set to 0 to ignore document length, or 1 for full normalization.
  3. Enable ‘Enable Chunking’ and configure ‘Chunk Size’ and ‘Chunk Overlap’ to split large documents into smaller searchable pieces.
  • Use BM25 alongside vector search (RAG Retriever) for hybrid retrieval - BM25 excels at exact keyword matches while vectors handle semantic similarity.
  • Enable ‘Auto Index’ for workflows where source data changes frequently. The operator tracks table changes and rebuilds the index automatically.
  • For large documents, enable chunking with overlap to ensure search terms near chunk boundaries are not missed.
  • Set a ‘Minimum Score’ threshold to filter out low-relevance results.
Search Query (Query) op('bm25').par.Query Str

Text to search for using BM25

Default:
"" (Empty String)
Top K Results (Topk) op('bm25').par.Topk Int

Maximum number of results to return

Default:
0
Range:
1 to 100
Slider Range:
0 to 1
Minimum Score (Minscore) op('bm25').par.Minscore Float

Minimum BM25 score threshold

Default:
0.0
Range:
0 to 10
Slider Range:
0 to 1
Execute Search (Search) op('bm25').par.Search Pulse

Search the BM25 index

Default:
False
Clear Results (Clearresults) op('bm25').par.Clearresults Pulse

Clear all search results

Default:
False
Clear Index (Clearindex) op('bm25').par.Clearindex Pulse

Clear the BM25 index

Default:
False
Status (Status) op('bm25').par.Status Str

Current operation status

Default:
"" (Empty String)
Index Size (Indexsize) op('bm25').par.Indexsize Str

Number of documents/chunks indexed

Default:
"" (Empty String)
Last Search Time (Lastsearchtime) op('bm25').par.Lastsearchtime Str

Time taken for last search

Default:
"" (Empty String)
Auto Index (Autoindex) op('bm25').par.Autoindex Toggle

Automatically rebuild index when searching if data changed

Default:
False
Dirty Index (Dirtyindex) op('bm25').par.Dirtyindex Pulse

Mark index as dirty (will rebuild on next search)

Default:
False
Input Mode (Inputmode) op('bm25').par.Inputmode Menu

How to get input data for BM25 indexing

Default:
source_table
Options:
source_table, rag_index, direct_table
Source Table DAT (Sourcetable) op('bm25').par.Sourcetable DAT

DAT table output from source operator (when using source_table mode)

Default:
"" (Empty String)
RAG Index Source (Ragindex) op('bm25').par.Ragindex COMP

RAG index operator to search over (when using rag_index mode)

Default:
"" (Empty String)
Direct Table DAT (Directtable) op('bm25').par.Directtable DAT

Table DAT to search directly (when using direct_table mode)

Default:
"" (Empty String)
K1 Parameter (K1) op('bm25').par.K1 Float

BM25 saturation parameter (typically 1.2-2.0)

Default:
0.0
Range:
0.1 to 3
Slider Range:
0 to 1
B Parameter (B) op('bm25').par.B Float

BM25 length normalization parameter (typically 0.75)

Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Enable Chunking (Enablechunking) op('bm25').par.Enablechunking Toggle

Split large content into smaller chunks

Default:
False
Chunk Size (Chunksize) op('bm25').par.Chunksize Int

Maximum characters per chunk

Default:
0
Range:
100 to 10000
Slider Range:
0 to 1
Chunk Overlap (Overlap) op('bm25').par.Overlap Int

Character overlap between chunks

Default:
0
Range:
0 to 1000
Slider Range:
0 to 1
Allow Agent Control (Allowagentcontrol) op('bm25').par.Allowagentcontrol Toggle

Allow agents to control search parameters

Default:
False
Agent Results to Table (Agentresultstable) op('bm25').par.Agentresultstable Toggle

Add agent search results to the results table

Default:
False