Source Docs Operator
Overview
Section titled “Overview”The Source Docs LOP (formerly DocumentParser) is designed to parse local documents (HTML/HTM, Python initially, extensible) into standardized index tables within TouchDesigner. It allows you to extract content and metadata from documents found within a specified folder structure, making it suitable for indexing local documentation, code, or text files for RAG systems.
Note: Requires the beautifulsoup4
Python library for HTML parsing.
Parameters
Section titled “Parameters”Parameters are organized into pages.
Document Folder (Documentfolder)
op('source_docs').par.Documentfolder
Folder - Default:
None
Folder Depth (Folderdepth)
op('source_docs').par.Folderdepth
Int - Default:
3
- Range:
- 1 to 20
File Pattern (Filepattern)
op('source_docs').par.Filepattern
Str - Default:
*.htm *.html *.py
Parse All Documents (Parseall)
op('source_docs').par.Parseall
Pulse - Default:
None
Stop Parsing (Stopparsing)
op('source_docs').par.Stopparsing
Pulse - Default:
None
Current Status (Status)
op('source_docs').par.Status
Str - Default:
Ready
Progress (Progress)
op('source_docs').par.Progress
Float - Default:
0
Clear Index Table (Clear)
op('source_docs').par.Clear
Pulse - Default:
None
Caution: Viewing large index tables can be slow Header
Max Stall Time (s) (Maxstalltime)
op('source_docs').par.Maxstalltime
Float - Default:
2
- Range:
- 0.1 to 10
Preview File (Previewfile)
op('source_docs').par.Previewfile
File - Default:
None
Parse Single File (Parsefile)
op('source_docs').par.Parsefile
Pulse - Default:
None
Analyze Document Structure (Analyzedoc)
op('source_docs').par.Analyzedoc
Pulse - Default:
None
Select Doc for Content View (Selectdoc)
op('source_docs').par.Selectdoc
Int - Default:
0
Selected Filename (Displayfile)
op('source_docs').par.Displayfile
Str - Default:
"" (Empty String)
Include Sections (HTML Only) Header
Auto Analyze on Preview Change (Autoupdate)
op('source_docs').par.Autoupdate
Toggle - Default:
Off
Include Unmatched Sections (Includemissing)
op('source_docs').par.Includemissing
Toggle - Default:
On
ChatTD (Chattd)
op('source_docs').par.Chattd
OP - Default:
/dot_lops/ChatTD
Helper Popups (Popups)
op('source_docs').par.Popups
Toggle - Default:
On
Show Built In Pars (Showbuiltin)
op('source_docs').par.Showbuiltin
Toggle - Default:
Off
Bypass (Bypass)
op('source_docs').par.Bypass
Toggle - Default:
Off
Callbacks
Section titled “Callbacks” Available Callbacks:
onParseStart
onParseComplete
onFileParsed
onFileSkipped
onAnalyzeComplete
onError
Usage Examples
Section titled “Usage Examples”Parsing All HTML Files in a Folder
Section titled “Parsing All HTML Files in a Folder”1. Set 'Document Folder' to the root folder containing HTML files.2. Set 'File Pattern' (e.g., `*.html *.htm`).3. Adjust 'Folder Depth' as needed.4. Pulse 'Parse All Documents'. Monitor 'Progress' and 'Status'.5. Results appear in the output `index_table` DAT.
Parsing a Single Python File
Section titled “Parsing a Single Python File”1. Set 'Preview File' to the target `.py` file.2. Pulse 'Parse Single File'.3. The parsed code (as text) will be added to the `index_table`.
Customizing HTML Section Parsing
Section titled “Customizing HTML Section Parsing”1. Set 'Preview File' to a representative HTML file.2. Pulse 'Analyze Document Structure' on the Single/Preview page.3. Go to the 'DocConfig' page. Toggle the dynamically generated parameters (e.g., `Include Maincontent`, `Include Sidebar`) to select desired sections.4. (Optional) Toggle 'Include Unmatched Sections' based on preference.5. Pulse 'Parse Single File' to test the configuration.6. If satisfied, use 'Parse All Documents' to apply the config to all matching files.
Technical Notes
Section titled “Technical Notes”- Parsing multiple documents (
Parse All Documents
) runs asynchronously via ChatTD. - HTML parsing uses
BeautifulSoup4
to extract text content. Structure analysis generates CSS selectors to identify sections. - Python file parsing currently extracts the entire code content as text.
- The output
index_table
is formatted for direct use with the Rag Index LOP. - Large numbers of files or very large files can take time to process.