Skip to content
  1. OPERATORS
  2. RETRIEVERS

Search Text

v1.0.0new

search_text builds a fast BM25 text index over tables, folders, or an existing RAG index. Use it when you need lightweight lexical retrieval, exact phrase support, required/excluded terms, code-aware tokenization, or a dependency-free companion to semantic search.

The operator reads the selected source, builds an in-memory BM25 index, and writes ranked hits to results_table plus search statistics to stats_table. It supports Source Table, RAG Index, Direct Table, and Folder input modes, optional chunking, field weighting, deduplication, and multi-query rank fusion.

Search syntax supports quoted phrases, +required terms, -excluded terms, and | separated multi-query searches when Multi-Query is enabled.

  1. On the BM25 page, choose Input Mode and set the matching source: Source Table, RAG Index Source, Direct Table DAT, or Document Folder.
  2. Choose a Search Preset and Tokenizer, then adjust K1 / B only when the preset is not enough.
  3. Enable Chunking for long documents, or Field Weighting when selected table columns should influence ranking more strongly.
  4. On the Search page, enter Query and set Top K Results, Minimum Score, Deduplicate, and Multi-Query as needed.
  5. Pulse Search. With Auto Index enabled, the index builds or rebuilds before searching when the source has changed.
  6. Inspect results_table, stats_table, and doc_index_table.
  • Inputs: No connector inputs. Select data through Source Table, RAG Index Source, Direct Table DAT, or Document Folder.
  • Output 1: Search results table.
  • Output 2: Search statistics table.

search_text exposes GetTool() after the text index is ready. The main tool name comes from Tool Name, defaulting to search_data, and lets an agent run single-query or multi-query BM25 searches.

When Expose Fetch Tool is enabled, the operator also exposes a companion fetch tool, derived as {Tool Name}_fetch unless Fetch Tool Name is set. The fetch tool retrieves the full document for a returned doc_id, including reassembled chunks when chunking was used.

Allow Agent Control lets the agent override search limits and BM25 scoring parameters for a call. Return Columns can keep agent results compact by returning selected source-table fields instead of full content.

  • agent: Uses the search and fetch tools after the index is ready.
  • search_rag: Pairs lexical BM25 search with semantic vector retrieval.
  • graph: Provides direct keyword retrieval alongside relationship traversal.
  • source_dat: Supplies table data for Source Table or Direct Table modes.
  • Agents do not see the search tool until the index is ready.
  • Auto Index can rebuild on Search when input data changes. Disable it when you want explicit Dirty Index / Search control.
  • Field Weighting can overemphasize boosted columns on structured metadata tables; use it only when those columns should dominate ranking.
  • RAG Index mode requires a ready index source that exposes chunks the operator can read.
  • Fetch tool lookups depend on stable doc_id values; chunked documents are reassembled by original document id.
Status (Status) op('search_text').par.Status Str

Current text-search status.

Default:
"" (Empty String)
Index Size (Indexsize) op('search_text').par.Indexsize Str
Default:
"" (Empty String)
Last Search Time (Lastsearchtime) op('search_text').par.Lastsearchtime Str
Default:
"" (Empty String)
Search Header
Query (Query) op('search_text').par.Query Str

Search query. Supports: "exact phrase", -exclude, +required terms

Default:
"" (Empty String)
Search (Search) op('search_text').par.Search Pulse
Default:
False
Config Header
Top K Results (Topk) op('search_text').par.Topk Int

Maximum number of text-search results to return.

Default:
10
Range:
1 to 100
Minimum Score (Minscore) op('search_text').par.Minscore Float

Minimum relevance score required for a result to be kept.

Default:
0.0
Range:
0 to 10
Deduplicate (Dedup) op('search_text').par.Dedup Toggle

Remove duplicate hits from the same source document.

Default:
True
Multi-Query (Multiquery) op('search_text').par.Multiquery Toggle

Split the query on | and fuse the per-query rankings.

Default:
False
Auto Index (Autoindex) op('search_text').par.Autoindex Toggle

Automatically rebuild the index before searching when inputs have changed.

Default:
True
Actions Header
Clear Results (Clearresults) op('search_text').par.Clearresults Pulse
Default:
False
Source Header
Input Mode (Inputmode) op('search_text').par.Inputmode Menu
Default:
source_table
Options:
source_table, rag_index, direct_table, folder
Source Table (Sourcetable) op('search_text').par.Sourcetable DAT
Default:
"" (Empty String)
RAG Index Source (Ragindex) op('search_text').par.Ragindex COMP
Default:
"" (Empty String)
Direct Table DAT (Directtable) op('search_text').par.Directtable DAT
Default:
"" (Empty String)
Document Folder (Documentfolder) op('search_text').par.Documentfolder Folder
Default:
"" (Empty String)
File Pattern (Filepattern) op('search_text').par.Filepattern Str
Default:
*.txt *.md *.py
Preset Header
Search Preset (Searchpreset) op('search_text').par.Searchpreset Menu
Default:
general
Options:
custom, general, code, keyword, document
Tokenizer (Tokenizer) op('search_text').par.Tokenizer Menu
Default:
standard
Options:
standard, code, aggressive
BM25 Parameters Header
K1 (K1) op('search_text').par.K1 Float

BM25 term-frequency saturation. Higher values reward repeated terms more.

Default:
1.5
Range:
0.1 to 3
B (B) op('search_text').par.B Float

BM25 length normalization. 0 disables length normalization; 1 applies it fully.

Default:
0.75
Range:
0 to 1
Chunking Header
Enable Chunking (Enablechunking) op('search_text').par.Enablechunking Toggle
Default:
False
Chunk Size (Chunksize) op('search_text').par.Chunksize Int

Maximum characters per indexed chunk when chunking is enabled.

Default:
1000
Range:
100 to 10000
Overlap (Overlap) op('search_text').par.Overlap Int

Characters shared between adjacent chunks when chunking is enabled.

Default:
200
Range:
0 to 1000
Field Weighting Header
Enable (Fieldweighting) op('search_text').par.Fieldweighting Toggle

Boost selected table columns while building the text index.

Default:
False
Boost Columns (Boostcolumns) op('search_text').par.Boostcolumns Str

Comma-separated column names to boost, such as name,title.

Default:
name,title
Boost Factor (Boostfactor) op('search_text').par.Boostfactor Float

Score multiplier for boosted columns during indexing.

Default:
3.0
Range:
1 to 10
Maintenance Header
Dirty Index (Dirtyindex) op('search_text').par.Dirtyindex Pulse

Mark the text index dirty so the next search rebuilds it

Default:
False
Clear Index (Clearindex) op('search_text').par.Clearindex Pulse

Clear indexed documents and index state

Default:
False
Reset (Reset) op('search_text').par.Reset Pulse

Clear results, index, and logs

Default:
False
Tool Name (Toolname) op('search_text').par.Toolname Str

Unique tool name for agent integration

Default:
search_data
Allow Agent Control (Allowagentcontrol) op('search_text').par.Allowagentcontrol Toggle
Default:
False
Tool Config Header
Tool Description (Tooldescription) op('search_text').par.Tooldescription DAT

Optional text DAT with custom tool description. Leave empty to use default.

Default:
"" (Empty String)
Describe Data Source (Describedatasource) op('search_text').par.Describedatasource Toggle

Append data source info to the tool description

Default:
False
Results to Table (Agentresultstable) op('search_text').par.Agentresultstable Toggle
Default:
True
Fetch Tool Header
Return Columns (Returncolumns) op('search_text').par.Returncolumns Str

Comma-list of source table columns to include in compact tool results. Empty returns full content.

Default:
"" (Empty String)
Expose Fetch Tool (Enablefetchtool) op('search_text').par.Enablefetchtool Toggle

Expose a companion fetch tool for retrieving full document content by doc_id.

Default:
True
Fetch Tool Name (Fetchtoolname) op('search_text').par.Fetchtoolname Str

Leave empty to auto-derive as "{Toolname}_fetch".

Default:
"" (Empty String)
vv1.0.0

# v1.0.0 (2026-05-02)

  • reorganized parameter layout: Search page first with status/query/config/actions, BM25 page with source/preset/algorithm/chunking/weighting/maintenance - added Reset pulse and Status/Indexsize/Lastsearchtime readouts to Search page top - moved agent behavior pars to Tools page - updated compose.json pairs_with (rank_fusion to search_merge) - updated category to Retrievers
  • split query oneOf into separate query/queries tool fields - removed required constraint on query parameter - added mutual exclusion validation for query vs queries
  • Initial search_text structure