Save Sources
The Save Sources LOP is a RAG utility operator that saves content from input tables to individual Markdown files. It provides intelligent filename generation from URLs, fallback column options, and comprehensive file management features, making it ideal for exporting scraped content, processed documents, or any tabular data to organized file structures.
Requirements
Section titled “Requirements”- Input Table: A table DAT with required columns
doc_id
andcontent
- Output Folder: A valid directory path for saving files
- Optional Columns:
source_path
for URL-based filenames, custom filename column
- Input Table DAT: Table containing data to save as files
- Required Columns:
doc_id
,content
- Optional Columns:
source_path
, custom filename column
- Required Columns:
Output
Section titled “Output”- Markdown Files: Individual
.md
files saved to the specified output folder - Progress Tracking: Real-time status and progress information
- File Statistics: Count of successfully saved files
Parameters
Section titled “Parameters”Save Config
Section titled “Save Config”op('save_sources').par.Outputfolder
folder The directory where Markdown files will be saved
- Default:
None
op('save_sources').par.Filenameprefix
str Optional prefix to add to the beginning of each saved filename
- Default:
None
op('save_sources').par.Filenamecolumn
str If URL is not used/available, use this column for filenames. If empty/not found, 'doc_id' is used
- Default:
filename
op('save_sources').par.Overwrite
toggle If enabled, existing Markdown files with the same name will be overwritten
- Default:
false
op('save_sources').par.Savemarkdown
pulse Starts the process of saving content from the input DAT to Markdown files
- Default:
None
op('save_sources').par.Clearstatus
pulse Resets the status, progress, and files saved counters
- Default:
None
op('save_sources').par.Status
str Current operation status and progress information
- Default:
None
op('save_sources').par.Progress
float Percentage completion of the current save operation
- Default:
None
op('save_sources').par.Filessaved
int Number of files successfully saved in the current operation
- Default:
None
op('save_sources').par.Useurlasfilename
toggle If enabled, attempts to create a safe filename from the 'source_path' column URL
- Default:
false
op('save_sources').par.Bypass
toggle Bypass the operator
- Default:
false
op('save_sources').par.Showbuiltin
toggle Show built-in TouchDesigner parameters
- Default:
false
op('save_sources').par.Version
str Current version of the operator
- Default:
None
op('save_sources').par.Lastupdated
str Date of last update
- Default:
None
op('save_sources').par.Creator
str Operator creator
- Default:
None
op('save_sources').par.Website
str Related website or documentation
- Default:
None
op('save_sources').par.Chattd
op Reference to the ChatTD operator for configuration
- Default:
None
Filename Generation Strategy
Section titled “Filename Generation Strategy”The operator uses a sophisticated multi-tier filename generation system:
1. URL-Based Filenames (Primary)
Section titled “1. URL-Based Filenames (Primary)”When “Use URL for Filename” is enabled and a source_path
column exists:
- Parses URLs to extract meaningful path components
- Sanitizes special characters and path separators
- Handles query parameters for generic pages
- Removes common file extensions (.html, .php, etc.)
- Truncates to reasonable length (100 characters)
2. Fallback Column (Secondary)
Section titled “2. Fallback Column (Secondary)”If URL generation fails or is disabled:
- Uses the specified “Filename Column”
- Applies basic sanitization for filesystem safety
- Defaults to “filename” column if not specified
3. Document ID (Final Fallback)
Section titled “3. Document ID (Final Fallback)”If both above methods fail:
- Uses the
doc_id
column value - Ensures every row gets a unique filename
- Provides reliable fallback for any table structure
Basic Setup
Section titled “Basic Setup”- Prepare Input Table: Ensure your table DAT has
doc_id
andcontent
columns - Set Output Folder: Choose or create a directory for the saved files
- Configure Filename Strategy: Choose URL-based, column-based, or doc_id naming
- Set Overwrite Policy: Decide whether to overwrite existing files
- Save Files: Click “Save Markdown Files” to begin the process
Example Table Structure
Section titled “Example Table Structure”doc_id | content | source_path | filenamedocument_001 | # My Document... | https://example.com/doc1 | custom_namedocument_002 | ## Another Document... | https://site.com/page2 | another_filedocument_003 | ### Third Document... | | manual_name
Advanced Configuration
Section titled “Advanced Configuration”URL-Based Naming Examples
Section titled “URL-Based Naming Examples”https://example.com/articles/machine-learning
→articles_machine-learning.md
https://site.com/docs/tutorial.html
→docs_tutorial.md
https://blog.com/index.php?id=123
→index_php_id_123.md
Custom Prefixes
Section titled “Custom Prefixes”- Prefix:
project_
→project_articles_machine-learning.md
- Prefix:
2024_
→2024_docs_tutorial.md
Integration Examples
Section titled “Integration Examples”With Source Operators
Section titled “With Source Operators”# Chain with source operatorssource_crawl = op('source_crawl4ai')save_sources = op('save_sources')
# Configure save sources to use crawled datasave_sources.par.Outputfolder = 'project/scraped_content'save_sources.par.Useurlasfilename = Truesave_sources.par.Filenameprefix = 'scraped_'save_sources.par.Overwrite = False
# Save the crawled contentsave_sources.par.Savemarkdown.pulse()
With RAG Index
Section titled “With RAG Index”# Save sources before indexingsave_sources = op('save_sources')rag_index = op('rag_index')
# Configure output foldersave_sources.par.Outputfolder = 'knowledge_base/documents'save_sources.par.Useurlasfilename = True
# Save files firstsave_sources.par.Savemarkdown.pulse()
# Then index the saved files# (Configure rag_index to read from the same folder)
Batch Processing Workflow
Section titled “Batch Processing Workflow”# Process multiple source tablessource_tables = ['web_scrape_results', 'document_imports', 'api_responses']
for table_name in source_tables: # Configure for each source save_sources.par.Outputfolder = f'output/{table_name}' save_sources.par.Filenameprefix = f'{table_name}_'
# Connect the appropriate input table save_sources.op('input_table').copy(op(table_name))
# Save files save_sources.par.Savemarkdown.pulse()
# Wait for completion (check status) while 'Completed' not in save_sources.par.Status.eval(): time.sleep(0.1)
File Management Features
Section titled “File Management Features”Overwrite Protection
Section titled “Overwrite Protection”- Enabled: Replaces existing files with same names
- Disabled: Skips files that already exist (default)
- Use Case: Incremental updates without data loss
Progress Tracking
Section titled “Progress Tracking”- Real-time Status: Current operation phase and details
- Progress Percentage: Completion percentage (0-100%)
- Files Saved Counter: Number of successfully saved files
- Error Reporting: Detailed error messages for troubleshooting
Filename Sanitization
Section titled “Filename Sanitization”- Path Separators: Converts
/
and\
to_
- Special Characters: Removes
<>:"/\|?*
- Unicode Support: Handles international characters safely
- Length Limits: Truncates overly long filenames
- Extension Management: Adds
.md
extension automatically
Use Cases
Section titled “Use Cases”Web Scraping Export
Section titled “Web Scraping Export”- Export scraped web content to organized file structure
- Use URL-based filenames for intuitive organization
- Maintain source traceability through filenames
Document Processing Pipeline
Section titled “Document Processing Pipeline”- Save processed documents from various sources
- Apply consistent naming conventions
- Prepare files for further analysis or indexing
Knowledge Base Creation
Section titled “Knowledge Base Creation”- Export curated content collections
- Organize by topic, source, or date
- Create searchable file archives
Content Migration
Section titled “Content Migration”- Export content from databases or APIs
- Convert to Markdown for version control
- Maintain metadata through filename conventions
Best Practices
Section titled “Best Practices”Table Preparation
Section titled “Table Preparation”- Required Columns: Always include
doc_id
andcontent
- Clean Data: Remove or escape problematic characters in content
- Consistent IDs: Use meaningful, unique document IDs
- URL Validation: Ensure
source_path
contains valid URLs if using URL naming
Filename Strategy
Section titled “Filename Strategy”- URL Naming: Best for web-scraped content with meaningful URLs
- Column Naming: Use for curated content with predefined names
- Prefix Usage: Add project or date prefixes for organization
- Length Consideration: Keep total path length under system limits
Performance Optimization
Section titled “Performance Optimization”- Batch Size: Process reasonable numbers of files at once
- Folder Structure: Create organized subfolder hierarchies
- Overwrite Settings: Use appropriate overwrite policies
- Progress Monitoring: Check status regularly for large operations
Error Handling
Section titled “Error Handling”- Path Validation: Verify output folder exists and is writable
- Content Validation: Check for empty or malformed content
- Filename Conflicts: Handle duplicate filenames appropriately
- Recovery: Use “Clear Status” to reset after errors
Troubleshooting
Section titled “Troubleshooting”Common Issues
Section titled “Common Issues”-
Permission Errors
- Verify output folder write permissions
- Check file system space availability
- Ensure folder path is accessible
-
Filename Conflicts
- Enable overwrite if updates are needed
- Use unique prefixes to avoid conflicts
- Check for duplicate doc_ids in input
-
Invalid Paths
- Use absolute paths or proper relative paths
- Verify folder exists or can be created
- Check for invalid characters in folder names
-
Performance Issues
- Process files in smaller batches
- Use faster storage devices
- Monitor system resources during operation
Status Messages
Section titled “Status Messages”- “Idle”: Ready for operation
- “Starting…”: Initializing save process
- “Saving files…”: Actively saving files
- “Completed: Saved X/Y files”: Operation finished successfully
- “Error: [message]”: Operation failed with specific error
This comprehensive file management system provides reliable, organized export capabilities for any RAG workflow or content processing pipeline.