BM25 Search
The BM25 LOP performs relevance-based keyword search over TouchDesigner tables and RAG indices using the industry-standard Okapi BM25 ranking algorithm. It complements vector-based retrieval by excelling at exact term matching and technical content search.
Agent Tool Integration
Section titled “Agent Tool Integration”This operator exposes 1 tool that allow Agent and Gemini Live LOPs to search indexed documents using BM25 keyword matching with configurable top-k and score thresholds.
Use the Tool Debugger operator to inspect exact tool definitions, schemas, and parameters.
When connected to an Agent LOP, the agent can call the bm25_search tool to find specific terms, technical content, and exact matches across your indexed documents. Enable ‘Allow Agent Control’ on the Agent page to let agents adjust BM25 parameters like k1 and b during search.
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”The operator supports three input modes, configured via ‘Input Mode’ on the Search page:
- Source Table - Connect a DAT table output from any source operator (Source Docs, Source Crawl4ai, etc.)
- RAG Index - Point to a RAG Index operator to search over its indexed chunks
- Direct Table - Reference any table DAT directly
Outputs
Section titled “Outputs”- Results Table - RAG-indexer compatible results with query, content, BM25 score, source path, metadata, and chunk context
- Stats Table - Search statistics including result count, average score, and timing
- Doc Index Table - Debug view of all indexed documents with chunk counts
Usage Examples
Section titled “Usage Examples”Basic Keyword Search
Section titled “Basic Keyword Search”- On the Search page, set ‘Input Mode’ to your preferred source type.
- If using ‘Source Table’, set the ‘Source Table DAT’ to your source operator’s output table.
- Enter a search query in the ‘Search Query’ field.
- Set ‘Top K Results’ to limit how many results are returned.
- Pulse ‘Execute Search’ to run the search.
- Results appear in the
results_tableDAT output.
Using with a RAG Index
Section titled “Using with a RAG Index”- Set ‘Input Mode’ to ‘RAG Index’.
- Set ‘RAG Index Source’ to your RAG Index operator.
- The BM25 operator will index the chunks from the RAG index, giving you keyword search alongside vector search.
- Enable ‘Auto Index’ to automatically rebuild the index when the source data changes.
Tuning BM25 Parameters
Section titled “Tuning BM25 Parameters”- On the BM25 page, adjust ‘K1 Parameter’ (default 1.5) to control term frequency saturation. Higher values give more weight to term frequency.
- Adjust ‘B Parameter’ (default 0.75) to control document length normalization. Set to 0 to ignore document length, or 1 for full normalization.
- Enable ‘Enable Chunking’ and configure ‘Chunk Size’ and ‘Chunk Overlap’ to split large documents into smaller searchable pieces.
Best Practices
Section titled “Best Practices”- Use BM25 alongside vector search (RAG Retriever) for hybrid retrieval - BM25 excels at exact keyword matches while vectors handle semantic similarity.
- Enable ‘Auto Index’ for workflows where source data changes frequently. The operator tracks table changes and rebuilds the index automatically.
- For large documents, enable chunking with overlap to ensure search terms near chunk boundaries are not missed.
- Set a ‘Minimum Score’ threshold to filter out low-relevance results.
Parameters
Section titled “Parameters”Search
Section titled “Search”op('bm25').par.Query Str Text to search for using BM25
- Default:
"" (Empty String)
op('bm25').par.Topk Int Maximum number of results to return
- Default:
0- Range:
- 1 to 100
- Slider Range:
- 0 to 1
op('bm25').par.Minscore Float Minimum BM25 score threshold
- Default:
0.0- Range:
- 0 to 10
- Slider Range:
- 0 to 1
op('bm25').par.Search Pulse Search the BM25 index
- Default:
False
op('bm25').par.Clearresults Pulse Clear all search results
- Default:
False
op('bm25').par.Clearindex Pulse Clear the BM25 index
- Default:
False
op('bm25').par.Status Str Current operation status
- Default:
"" (Empty String)
op('bm25').par.Indexsize Str Number of documents/chunks indexed
- Default:
"" (Empty String)
op('bm25').par.Lastsearchtime Str Time taken for last search
- Default:
"" (Empty String)
op('bm25').par.Autoindex Toggle Automatically rebuild index when searching if data changed
- Default:
False
op('bm25').par.Dirtyindex Pulse Mark index as dirty (will rebuild on next search)
- Default:
False
op('bm25').par.Sourcetable DAT DAT table output from source operator (when using source_table mode)
- Default:
"" (Empty String)
op('bm25').par.Ragindex COMP RAG index operator to search over (when using rag_index mode)
- Default:
"" (Empty String)
op('bm25').par.Directtable DAT Table DAT to search directly (when using direct_table mode)
- Default:
"" (Empty String)
op('bm25').par.K1 Float BM25 saturation parameter (typically 1.2-2.0)
- Default:
0.0- Range:
- 0.1 to 3
- Slider Range:
- 0 to 1
op('bm25').par.B Float BM25 length normalization parameter (typically 0.75)
- Default:
0.0- Range:
- 0 to 1
- Slider Range:
- 0 to 1
op('bm25').par.Enablechunking Toggle Split large content into smaller chunks
- Default:
False
op('bm25').par.Chunksize Int Maximum characters per chunk
- Default:
0- Range:
- 100 to 10000
- Slider Range:
- 0 to 1
op('bm25').par.Overlap Int Character overlap between chunks
- Default:
0- Range:
- 0 to 1000
- Slider Range:
- 0 to 1
op('bm25').par.Allowagentcontrol Toggle Allow agents to control search parameters
- Default:
False
op('bm25').par.Agentresultstable Toggle Add agent search results to the results table
- Default:
False