OPERATORS
RETRIEVERS

Search Text

v1.0.0new

search_text builds a fast BM25 text index over tables, folders, or an existing RAG index. Use it when you need lightweight lexical retrieval, exact phrase support, required/excluded terms, code-aware tokenization, or a dependency-free companion to semantic search.

What It Does

The operator reads the selected source, builds an in-memory BM25 index, and writes ranked hits to results_table plus search statistics to stats_table. It supports Source Table, RAG Index, Direct Table, and Folder input modes, optional chunking, field weighting, deduplication, and multi-query rank fusion.

Search syntax supports quoted phrases, +required terms, -excluded terms, and | separated multi-query searches when Multi-Query is enabled.

Typical Workflow

On the BM25 page, choose Input Mode and set the matching source: Source Table, RAG Index Source, Direct Table DAT, or Document Folder.
Choose a Search Preset and Tokenizer, then adjust K1 / B only when the preset is not enough.
Enable Chunking for long documents, or Field Weighting when selected table columns should influence ranking more strongly.
On the Search page, enter Query and set Top K Results, Minimum Score, Deduplicate, and Multi-Query as needed.
Pulse Search. With Auto Index enabled, the index builds or rebuilds before searching when the source has changed.
Inspect results_table, stats_table, and doc_index_table.

Inputs And Outputs

Inputs: No connector inputs. Select data through Source Table, RAG Index Source, Direct Table DAT, or Document Folder.
Output 1: Search results table.
Output 2: Search statistics table.

Agent Tool Use

search_text exposes GetTool() after the text index is ready. The main tool name comes from Tool Name, defaulting to search_data, and lets an agent run single-query or multi-query BM25 searches.

When Expose Fetch Tool is enabled, the operator also exposes a companion fetch tool, derived as {Tool Name}_fetch unless Fetch Tool Name is set. The fetch tool retrieves the full document for a returned doc_id, including reassembled chunks when chunking was used.

Allow Agent Control lets the agent override search limits and BM25 scoring parameters for a call. Return Columns can keep agent results compact by returning selected source-table fields instead of full content.

Works Well With

agent: Uses the search and fetch tools after the index is ready.
search_rag: Pairs lexical BM25 search with semantic vector retrieval.
graph: Provides direct keyword retrieval alongside relationship traversal.
source_dat: Supplies table data for Source Table or Direct Table modes.

Gotchas

Agents do not see the search tool until the index is ready.
Auto Index can rebuild on Search when input data changes. Disable it when you want explicit Dirty Index / Search control.
Field Weighting can overemphasize boosted columns on structured metadata tables; use it only when those columns should dominate ranking.
RAG Index mode requires a ready index source that exposes chunks the operator can read.
Fetch tool lookups depend on stable doc_id values; chunked documents are reassembled by original document id.

Parameters

Search

Status (Status) op('search_text').par.Status Str

Current text-search status.

Default:: "" (Empty String)

Index Size (Indexsize) op('search_text').par.Indexsize Str

Default:: "" (Empty String)

Last Search Time (Lastsearchtime) op('search_text').par.Lastsearchtime Str

Default:: "" (Empty String)

Search Header

Query (Query) op('search_text').par.Query Str

Search query. Supports: "exact phrase", -exclude, +required terms

Default:: "" (Empty String)

Search (Search) op('search_text').par.Search Pulse

Default:: False

Config Header

Top K Results (Topk) op('search_text').par.Topk Int

Maximum number of text-search results to return.

Default:: 10
Range:: 1 to 100

Minimum Score (Minscore) op('search_text').par.Minscore Float

Minimum relevance score required for a result to be kept.

Default:: 0.0
Range:: 0 to 10

Deduplicate (Dedup) op('search_text').par.Dedup Toggle

Remove duplicate hits from the same source document.

Default:: True

Multi-Query (Multiquery) op('search_text').par.Multiquery Toggle

Split the query on | and fuse the per-query rankings.

Default:: False

Auto Index (Autoindex) op('search_text').par.Autoindex Toggle

Automatically rebuild the index before searching when inputs have changed.

Default:: True

Actions Header

Clear Results (Clearresults) op('search_text').par.Clearresults Pulse

Default:: False

BM25

Source Header

Source Table (Sourcetable) op('search_text').par.Sourcetable DAT

Default:: "" (Empty String)

RAG Index Source (Ragindex) op('search_text').par.Ragindex COMP

Default:: "" (Empty String)

Direct Table DAT (Directtable) op('search_text').par.Directtable DAT

Default:: "" (Empty String)

Document Folder (Documentfolder) op('search_text').par.Documentfolder Folder

Default:: "" (Empty String)

File Pattern (Filepattern) op('search_text').par.Filepattern Str

Default:: *.txt *.md *.py

Preset Header

BM25 Parameters Header

K1 (K1) op('search_text').par.K1 Float

BM25 term-frequency saturation. Higher values reward repeated terms more.

Default:: 1.5
Range:: 0.1 to 3

B (B) op('search_text').par.B Float

BM25 length normalization. 0 disables length normalization; 1 applies it fully.

Default:: 0.75
Range:: 0 to 1

Chunking Header

Enable Chunking (Enablechunking) op('search_text').par.Enablechunking Toggle

Default:: False

Chunk Size (Chunksize) op('search_text').par.Chunksize Int

Maximum characters per indexed chunk when chunking is enabled.

Default:: 1000
Range:: 100 to 10000

Overlap (Overlap) op('search_text').par.Overlap Int

Characters shared between adjacent chunks when chunking is enabled.

Default:: 200
Range:: 0 to 1000

Field Weighting Header

Enable (Fieldweighting) op('search_text').par.Fieldweighting Toggle

Boost selected table columns while building the text index.

Default:: False

Boost Columns (Boostcolumns) op('search_text').par.Boostcolumns Str

Comma-separated column names to boost, such as name,title.

Default:: name,title

Boost Factor (Boostfactor) op('search_text').par.Boostfactor Float

Score multiplier for boosted columns during indexing.

Default:: 3.0
Range:: 1 to 10

Maintenance Header

Dirty Index (Dirtyindex) op('search_text').par.Dirtyindex Pulse

Mark the text index dirty so the next search rebuilds it

Default:: False

Clear Index (Clearindex) op('search_text').par.Clearindex Pulse

Clear indexed documents and index state

Default:: False

Reset (Reset) op('search_text').par.Reset Pulse

Clear results, index, and logs

Default:: False

Tool

Tool Name (Toolname) op('search_text').par.Toolname Str

Unique tool name for agent integration

Default:: search_data

Allow Agent Control (Allowagentcontrol) op('search_text').par.Allowagentcontrol Toggle

Default:: False

Tool Config Header

Tool Description (Tooldescription) op('search_text').par.Tooldescription DAT

Optional text DAT with custom tool description. Leave empty to use default.

Default:: "" (Empty String)

Describe Data Source (Describedatasource) op('search_text').par.Describedatasource Toggle

Append data source info to the tool description

Default:: False

Results to Table (Agentresultstable) op('search_text').par.Agentresultstable Toggle

Default:: True

Fetch Tool Header

Return Columns (Returncolumns) op('search_text').par.Returncolumns Str

Comma-list of source table columns to include in compact tool results. Empty returns full content.

Default:: "" (Empty String)

Expose Fetch Tool (Enablefetchtool) op('search_text').par.Enablefetchtool Toggle

Expose a companion fetch tool for retrieving full document content by doc_id.

Default:: True

Fetch Tool Name (Fetchtoolname) op('search_text').par.Fetchtoolname Str

Leave empty to auto-derive as "{Toolname}_fetch".

Default:: "" (Empty String)

Changelog

vv1.0.0

# v1.0.0 (2026-05-02)

reorganized parameter layout: Search page first with status/query/config/actions, BM25 page with source/preset/algorithm/chunking/weighting/maintenance - added Reset pulse and Status/Indexsize/Lastsearchtime readouts to Search page top - moved agent behavior pars to Tools page - updated compose.json pairs_with (rank_fusion to search_merge) - updated category to Retrievers
split query oneOf into separate query/queries tool fields - removed required constraint on query parameter - added mutual exclusion validation for query vs queries
Initial search_text structure