Skip to content

Save Sources

The Save Sources LOP is a RAG utility operator that saves content from input tables to individual Markdown files. It provides intelligent filename generation from URLs, fallback column options, and comprehensive file management features, making it ideal for exporting scraped content, processed documents, or any tabular data to organized file structures.

  • Input Table: A table DAT with required columns doc_id and content
  • Output Folder: A valid directory path for saving files
  • Optional Columns: source_path for URL-based filenames, custom filename column
  • Input Table DAT: Table containing data to save as files
    • Required Columns: doc_id, content
    • Optional Columns: source_path, custom filename column
  • Markdown Files: Individual .md files saved to the specified output folder
  • Progress Tracking: Real-time status and progress information
  • File Statistics: Count of successfully saved files
Output Folder (Outputfolder) op('save_sources').par.Outputfolder folder

The directory where Markdown files will be saved

Default:
None
Filename Prefix (Optional) (Filenameprefix) op('save_sources').par.Filenameprefix str

Optional prefix to add to the beginning of each saved filename

Default:
None
Filename Column (Optional) (Filenamecolumn) op('save_sources').par.Filenamecolumn str

If URL is not used/available, use this column for filenames. If empty/not found, 'doc_id' is used

Default:
filename
Overwrite Existing Files (Overwrite) op('save_sources').par.Overwrite toggle

If enabled, existing Markdown files with the same name will be overwritten

Default:
false
Save Markdown Files (Savemarkdown) op('save_sources').par.Savemarkdown pulse

Starts the process of saving content from the input DAT to Markdown files

Default:
None
Clear Status (Clearstatus) op('save_sources').par.Clearstatus pulse

Resets the status, progress, and files saved counters

Default:
None
Current Status (Status) op('save_sources').par.Status str

Current operation status and progress information

Default:
None
Progress (%) (Progress) op('save_sources').par.Progress float

Percentage completion of the current save operation

Default:
None
Files Saved (Filessaved) op('save_sources').par.Filessaved int

Number of files successfully saved in the current operation

Default:
None
Use URL for Filename (Useurlasfilename) op('save_sources').par.Useurlasfilename toggle

If enabled, attempts to create a safe filename from the 'source_path' column URL

Default:
false
Bypass (Bypass) op('save_sources').par.Bypass toggle

Bypass the operator

Default:
false
Show Built-in Parameters (Showbuiltin) op('save_sources').par.Showbuiltin toggle

Show built-in TouchDesigner parameters

Default:
false
Version (Version) op('save_sources').par.Version str

Current version of the operator

Default:
None
Last Updated (Lastupdated) op('save_sources').par.Lastupdated str

Date of last update

Default:
None
Creator (Creator) op('save_sources').par.Creator str

Operator creator

Default:
None
Website (Website) op('save_sources').par.Website str

Related website or documentation

Default:
None
ChatTD Operator (Chattd) op('save_sources').par.Chattd op

Reference to the ChatTD operator for configuration

Default:
None

The operator uses a sophisticated multi-tier filename generation system:

When “Use URL for Filename” is enabled and a source_path column exists:

  • Parses URLs to extract meaningful path components
  • Sanitizes special characters and path separators
  • Handles query parameters for generic pages
  • Removes common file extensions (.html, .php, etc.)
  • Truncates to reasonable length (100 characters)

If URL generation fails or is disabled:

  • Uses the specified “Filename Column”
  • Applies basic sanitization for filesystem safety
  • Defaults to “filename” column if not specified

If both above methods fail:

  • Uses the doc_id column value
  • Ensures every row gets a unique filename
  • Provides reliable fallback for any table structure
  1. Prepare Input Table: Ensure your table DAT has doc_id and content columns
  2. Set Output Folder: Choose or create a directory for the saved files
  3. Configure Filename Strategy: Choose URL-based, column-based, or doc_id naming
  4. Set Overwrite Policy: Decide whether to overwrite existing files
  5. Save Files: Click “Save Markdown Files” to begin the process
doc_id | content | source_path | filename
document_001 | # My Document... | https://example.com/doc1 | custom_name
document_002 | ## Another Document... | https://site.com/page2 | another_file
document_003 | ### Third Document... | | manual_name
  • https://example.com/articles/machine-learningarticles_machine-learning.md
  • https://site.com/docs/tutorial.htmldocs_tutorial.md
  • https://blog.com/index.php?id=123index_php_id_123.md
  • Prefix: project_project_articles_machine-learning.md
  • Prefix: 2024_2024_docs_tutorial.md
# Chain with source operators
source_crawl = op('source_crawl4ai')
save_sources = op('save_sources')
# Configure save sources to use crawled data
save_sources.par.Outputfolder = 'project/scraped_content'
save_sources.par.Useurlasfilename = True
save_sources.par.Filenameprefix = 'scraped_'
save_sources.par.Overwrite = False
# Save the crawled content
save_sources.par.Savemarkdown.pulse()
# Save sources before indexing
save_sources = op('save_sources')
rag_index = op('rag_index')
# Configure output folder
save_sources.par.Outputfolder = 'knowledge_base/documents'
save_sources.par.Useurlasfilename = True
# Save files first
save_sources.par.Savemarkdown.pulse()
# Then index the saved files
# (Configure rag_index to read from the same folder)
# Process multiple source tables
source_tables = ['web_scrape_results', 'document_imports', 'api_responses']
for table_name in source_tables:
# Configure for each source
save_sources.par.Outputfolder = f'output/{table_name}'
save_sources.par.Filenameprefix = f'{table_name}_'
# Connect the appropriate input table
save_sources.op('input_table').copy(op(table_name))
# Save files
save_sources.par.Savemarkdown.pulse()
# Wait for completion (check status)
while 'Completed' not in save_sources.par.Status.eval():
time.sleep(0.1)
  • Enabled: Replaces existing files with same names
  • Disabled: Skips files that already exist (default)
  • Use Case: Incremental updates without data loss
  • Real-time Status: Current operation phase and details
  • Progress Percentage: Completion percentage (0-100%)
  • Files Saved Counter: Number of successfully saved files
  • Error Reporting: Detailed error messages for troubleshooting
  • Path Separators: Converts / and \ to _
  • Special Characters: Removes <>:"/\|?*
  • Unicode Support: Handles international characters safely
  • Length Limits: Truncates overly long filenames
  • Extension Management: Adds .md extension automatically
  • Export scraped web content to organized file structure
  • Use URL-based filenames for intuitive organization
  • Maintain source traceability through filenames
  • Save processed documents from various sources
  • Apply consistent naming conventions
  • Prepare files for further analysis or indexing
  • Export curated content collections
  • Organize by topic, source, or date
  • Create searchable file archives
  • Export content from databases or APIs
  • Convert to Markdown for version control
  • Maintain metadata through filename conventions
  • Required Columns: Always include doc_id and content
  • Clean Data: Remove or escape problematic characters in content
  • Consistent IDs: Use meaningful, unique document IDs
  • URL Validation: Ensure source_path contains valid URLs if using URL naming
  • URL Naming: Best for web-scraped content with meaningful URLs
  • Column Naming: Use for curated content with predefined names
  • Prefix Usage: Add project or date prefixes for organization
  • Length Consideration: Keep total path length under system limits
  • Batch Size: Process reasonable numbers of files at once
  • Folder Structure: Create organized subfolder hierarchies
  • Overwrite Settings: Use appropriate overwrite policies
  • Progress Monitoring: Check status regularly for large operations
  • Path Validation: Verify output folder exists and is writable
  • Content Validation: Check for empty or malformed content
  • Filename Conflicts: Handle duplicate filenames appropriately
  • Recovery: Use “Clear Status” to reset after errors
  1. Permission Errors

    • Verify output folder write permissions
    • Check file system space availability
    • Ensure folder path is accessible
  2. Filename Conflicts

    • Enable overwrite if updates are needed
    • Use unique prefixes to avoid conflicts
    • Check for duplicate doc_ids in input
  3. Invalid Paths

    • Use absolute paths or proper relative paths
    • Verify folder exists or can be created
    • Check for invalid characters in folder names
  4. Performance Issues

    • Process files in smaller batches
    • Use faster storage devices
    • Monitor system resources during operation
  • “Idle”: Ready for operation
  • “Starting…”: Initializing save process
  • “Saving files…”: Actively saving files
  • “Completed: Saved X/Y files”: Operation finished successfully
  • “Error: [message]”: Operation failed with specific error

This comprehensive file management system provides reliable, organized export capabilities for any RAG workflow or content processing pipeline.