RAG Index

v2.1.0Updated

The RAG Index operator builds vector store indices from your documents. Feed it a folder of files or a structured document table, choose an embedding model, and it produces a searchable index that downstream operators like RAG Retriever can query. All embedding and vector storage is handled by the embedding_sidecar service over HTTP — the operator does not run any vector or embedding code locally. Indices can be saved to disk and reloaded across sessions.

Requirements

SideCar: The LOPs SideCar must be running with the embedding_sidecar service. The operator starts the sidecar automatically when needed.
Embedding Provider: A running Ollama instance with an embedding model pulled (e.g. ollama pull nomic-embed-text). The OpenAI option is also available in the Embedding Model menu.

Input/Output

Inputs

Input 1 (optional): A Table DAT with columns doc_id, filename, content, metadata. Used when Input Mode is set to Doc Table or Auto Detect.

Outputs

No wired outputs. The operator maintains internal tables (documents, index info, stats) and holds the index on the embedding sidecar for use by connected RAG Retriever operators.

Usage Examples

Indexing a Folder of Documents

On the Index page, set Input Mode to “Folder”.
Set Document Folder to the path containing your files.
Set File Pattern to match your documents (e.g. *.txt *.md *.py). Separate multiple patterns with spaces.
Choose an Embedding Model — “Local (Ollama)” for local processing or “OpenAI” for cloud embeddings.
If using Ollama, pick a model from the Ollama Model menu (nomic-embed-text, mxbai-embed-large, or all-minilm).
Optionally adjust Chunk Size and Chunk Overlap to control how documents are split.
Give the index a name in Index Name or let the operator generate one automatically.
Pulse Create Index. The Current Status and Progress fields update in real time as documents are sent to the embedding sidecar for processing.

Indexing from a Document Table

Wire a Table DAT into the operator’s first input. The table must have doc_id, filename, content, and metadata columns (metadata as JSON strings).
Set Input Mode to “Doc Table” (or leave on “Auto Detect” — it will detect the wired input automatically).
Choose your embedding model and pulse Create Index.

Saving and Reloading Indices

Enable Sync to File to persist index data to disk during creation.
Set Index Folder to your preferred save location. If left blank, the operator saves to project/index/{index_name}/ automatically.
Pulse Save Index at any time to manually save the current index state. A config.json file is saved alongside the vector data with embedding settings and index statistics.
To reload a saved index, set Index Folder to the directory containing your saved index and pulse Load Index. The operator restores embedding model settings from the saved config automatically.
Enable Load on Start to automatically reload the index when the TouchDesigner project opens (requires Sync to File to be enabled).

Clearing and Rebuilding

Pulse Clear All to remove the current index and all internal tables, both locally and on the embedding sidecar. This resets the operator to a clean state for rebuilding.

If an index creation is taking too long, pulse Stop Index Creation to cancel. Note that the server-side embedding operation may still complete.

Best Practices

Start with default chunk settings and adjust based on retrieval quality. Smaller chunks give more precise results but increase index size.
Use local embeddings (Ollama) for privacy-sensitive data or offline workflows. OpenAI embeddings tend to produce higher quality results for general text.
Name your indices using the Index Name field before creating — this makes saved folders easier to identify and prevents auto-generated names.
Save to file for any index you want to persist. In-memory indices on the embedding sidecar are lost when the SideCar stops.
Check the stats table after index creation for a detailed breakdown of document counts, chunk counts, token estimates, and file types processed.

Troubleshooting

“Embedding server not available”: The SideCar service needs to be running with the embedding_sidecar. It should start automatically, but check the SideCar operator if issues persist.
Index creation stalls or errors: Check the Logger for detailed messages. Common causes are an unreachable Ollama server or missing API keys for OpenAI embeddings.
“No documents to process”: Verify your Document Folder path and File Pattern, or check that your wired input table has data rows beyond the header.
Embedding model errors: Ensure Ollama is running (ollama serve) and the selected model is pulled (ollama pull nomic-embed-text).
Load fails with “Index folder not found”: Confirm the Index Folder path points to a directory that was previously saved by this operator, containing a config.json and the vector store data.

Parameters

Index

Create Index (Createindex) op('rag_index').par.Createindex Pulse

Default:: False

Index Name (Indexname) op('rag_index').par.Indexname Str

Default:: "" (Empty String)

Document Folder (Documentfolder) op('rag_index').par.Documentfolder Folder

Default:: "" (Empty String)

File Pattern (Filepattern) op('rag_index').par.Filepattern Str

Default:: "" (Empty String)

Current Status (Status) op('rag_index').par.Status Str

Default:: "" (Empty String)

Active Index (Activeindex) op('rag_index').par.Activeindex Toggle

Default:: False

Ollama Model (Ollamamodel) op('rag_index').par.Ollamamodel StrMenu

Default:

"" (Empty String)

Menu Options:

nomic-embed-text (nomic-embed-text)
mxbai-embed-large (mxbai-embed-large)
all-minilm (all-minilm)

Chunk Size (Chunksize) op('rag_index').par.Chunksize Int

Default:: 0
Range:: 0 to 1
Slider Range:: 0 to 1

Chunk Overlap (Chunkoverlap) op('rag_index').par.Chunkoverlap Int

Default:: 0
Range:: 0 to 1
Slider Range:: 0 to 1

Sync to File (Savetofile) op('rag_index').par.Savetofile Toggle

Default:: False

Index Folder (Indexfolder) op('rag_index').par.Indexfolder Folder

Default:: "" (Empty String)

Save Index (Saveindex) op('rag_index').par.Saveindex Pulse

Default:: False

Load Index (Loadindex) op('rag_index').par.Loadindex Pulse

Default:: False

Load on Start (Loadonstart) op('rag_index').par.Loadonstart Toggle

Default:: False

Clear All (Clearall) op('rag_index').par.Clearall Pulse

Default:: False

Header

Progress (Progress) op('rag_index').par.Progress Float

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

Stop Index Creation (Stopindex) op('rag_index').par.Stopindex Pulse

Default:: False

Changelog

v2.1.02026-03-16

Added RAG index creation with embedding_sidecar integration - Implemented document processing from tables and folders - Added index persistence and configuration saving

v2.0.02026-03-02

Refactor to HTTP sidecar client, remove llama-index dependency - All vector operations via embedding_server over HTTP - Add collection name sanitization - Add sidecar field to manifest

v1.1.22026-03-01

Replace torch import check with importlib.metadata for TD 32050+ compatibility
Initial commit

v1.1.12025-08-03

Fixed missing Ollama embeddings integration by adding llama-index-embeddings-ollama to installation packages
Added llama-index-embeddings-huggingface and llama-index-embeddings-openai to ensure all embedding types work properly
Resolved "No module named 'llama_index.embeddings.ollama'" error during embedding model initialization

v1.1.02025-07-30

Added IndexActive tdu.Dependency for reactive state tracking
Switched to Ollama for local embeddings to fix numpy conflicts
Added Ollamamodel parameter with nomic-embed-text, mxbai-embed-large, all-minilm options
Enhanced config saving/loading to include Ollama model selection
Improved error handling and logging

v1.0.02024-11-06

Initial release