Skip to content

Source Docs Operator

The Source Docs LOP (formerly DocumentParser) is designed to parse local documents (HTML/HTM, Python initially, extensible) into standardized index tables within TouchDesigner. It allows you to extract content and metadata from documents found within a specified folder structure, making it suitable for indexing local documentation, code, or text files for RAG systems.

Note: Requires the beautifulsoup4 Python library for HTML parsing.

Parameters are organized into pages.

Document Folder (Documentfolder) op('source_docs').par.Documentfolder Folder
Default:
None
Folder Depth (Folderdepth) op('source_docs').par.Folderdepth Int
Default:
3
Range:
1 to 20
File Pattern (Filepattern) op('source_docs').par.Filepattern Str
Default:
*.htm *.html *.py
Parse All Documents (Parseall) op('source_docs').par.Parseall Pulse
Default:
None
Stop Parsing (Stopparsing) op('source_docs').par.Stopparsing Pulse
Default:
None
Current Status (Status) op('source_docs').par.Status Str
Default:
Ready
Progress (Progress) op('source_docs').par.Progress Float
Default:
0
Clear Index Table (Clear) op('source_docs').par.Clear Pulse
Default:
None
Caution: Viewing large index tables can be slow Header
Max Stall Time (s) (Maxstalltime) op('source_docs').par.Maxstalltime Float
Default:
2
Range:
0.1 to 10
Preview File (Previewfile) op('source_docs').par.Previewfile File
Default:
None
Parse Single File (Parsefile) op('source_docs').par.Parsefile Pulse
Default:
None
Analyze Document Structure (Analyzedoc) op('source_docs').par.Analyzedoc Pulse
Default:
None
Display Mode (Display) op('source_docs').par.Display Menu
Default:
index
Options:
index, content
Select Doc for Content View (Selectdoc) op('source_docs').par.Selectdoc Int
Default:
0
Selected Filename (Displayfile) op('source_docs').par.Displayfile Str
Default:
"" (Empty String)
Include Sections (HTML Only) Header
Auto Analyze on Preview Change (Autoupdate) op('source_docs').par.Autoupdate Toggle
Default:
Off
Include Unmatched Sections (Includemissing) op('source_docs').par.Includemissing Toggle
Default:
On
ChatTD (Chattd) op('source_docs').par.Chattd OP
Default:
/dot_lops/ChatTD
Helper Popups (Popups) op('source_docs').par.Popups Toggle
Default:
On
Show Built In Pars (Showbuiltin) op('source_docs').par.Showbuiltin Toggle
Default:
Off
Bypass (Bypass) op('source_docs').par.Bypass Toggle
Default:
Off
Available Callbacks:
  • onParseStart
  • onParseComplete
  • onFileParsed
  • onFileSkipped
  • onAnalyzeComplete
  • onError
1. Set 'Document Folder' to the root folder containing HTML files.
2. Set 'File Pattern' (e.g., `*.html *.htm`).
3. Adjust 'Folder Depth' as needed.
4. Pulse 'Parse All Documents'. Monitor 'Progress' and 'Status'.
5. Results appear in the output `index_table` DAT.
1. Set 'Preview File' to the target `.py` file.
2. Pulse 'Parse Single File'.
3. The parsed code (as text) will be added to the `index_table`.
1. Set 'Preview File' to a representative HTML file.
2. Pulse 'Analyze Document Structure' on the Single/Preview page.
3. Go to the 'DocConfig' page. Toggle the dynamically generated parameters (e.g., `Include Maincontent`, `Include Sidebar`) to select desired sections.
4. (Optional) Toggle 'Include Unmatched Sections' based on preference.
5. Pulse 'Parse Single File' to test the configuration.
6. If satisfied, use 'Parse All Documents' to apply the config to all matching files.
  • Parsing multiple documents (Parse All Documents) runs asynchronously via ChatTD.
  • HTML parsing uses BeautifulSoup4 to extract text content. Structure analysis generates CSS selectors to identify sections.
  • Python file parsing currently extracts the entire code content as text.
  • The output index_table is formatted for direct use with the Rag Index LOP.
  • Large numbers of files or very large files can take time to process.