Source Github Operator
Overview
Section titled “Overview”The Source Github LOP (formerly GitHubParser) is designed to ingest and parse content from public or private GitHub repositories, transforming it into a structured DAT table suitable for use with the Rag Index LOP. It supports parsing documentation files (.md
, .rst
), issues, pull requests, code files (specific languages), and wikis. This allows you to create a comprehensive knowledge base from a GitHub repository for RAG applications.
Note: Requires the requests
Python library. Authentication is recommended to avoid strict GitHub API rate limits.
Parameters
Section titled “Parameters”Parameters are organized into pages.
Repository URL (Repourl)
op('source_github').par.Repourl
Str - Default:
None
Branch/Tag (Branch)
op('source_github').par.Branch
Str - Default:
main
Parse Repository (Parse)
op('source_github').par.Parse
Pulse - Default:
None
Stop Processing (Stop)
op('source_github').par.Stop
Pulse - Default:
None
Current Status (Status)
op('source_github').par.Status
Str - Default:
Ready
Progress (Progress)
op('source_github').par.Progress
Float - Default:
0
API Rate Limit (Ratelimit)
op('source_github').par.Ratelimit
Str - Default:
"" (Empty String)
Clear All (Clear)
op('source_github').par.Clear
Pulse - Default:
None
Caution: Viewing large index tables can be slow Header
Select Doc (Selectdoc)
op('source_github').par.Selectdoc
Int - Default:
0
Display File (Displayfile)
op('source_github').par.Displayfile
Str - Default:
"" (Empty String)
Include Documentation (Includedocs)
op('source_github').par.Includedocs
Toggle - Default:
On
Doc File Patterns (Docpatterns)
op('source_github').par.Docpatterns
Str - Default:
*.md *.rst *.txt docs/* wiki/*
Include Wiki (Includewiki)
op('source_github').par.Includewiki
Toggle - Default:
On
Include Issues/PRs (Includeissues)
op('source_github').par.Includeissues
Toggle - Default:
On
Max Issues/PRs (Issuelimit)
op('source_github').par.Issuelimit
Int - Default:
10
- Range:
- 1 to 1000
- Slider Range:
- 10 to 100
Include Comments (Includecomments)
op('source_github').par.Includecomments
Toggle - Default:
On
Include Code Files (Includecode)
op('source_github').par.Includecode
Toggle - Default:
On
Code Languages (Codelanguages)
op('source_github').par.Codelanguages
Str - Default:
python javascript typescript
Include Code Context (Includecontext)
op('source_github').par.Includecontext
Toggle - Default:
On
Max File Size (KB) (Maxfilesize)
op('source_github').par.Maxfilesize
Int - Default:
500
- Range:
- 1 to 10000
- Slider Range:
- 100 to 1000
Ignore Patterns (Ignorepaths)
op('source_github').par.Ignorepaths
Str - Default:
node_modules/* .git/* tests/*
Max Directory Depth (Maxdepth)
op('source_github').par.Maxdepth
Int - Default:
10
- Range:
- 1 to 50
- Slider Range:
- 5 to 20
Use Authentication (Useauth)
op('source_github').par.Useauth
Toggle - Default:
Off
GitHub Token (Token)
op('source_github').par.Token
Str - Default:
None
ChatTD (Chattd)
op('source_github').par.Chattd
OP - Default:
/dot_lops/ChatTD
Show Built In Pars (Showbuiltin)
op('source_github').par.Showbuiltin
Toggle - Default:
Off
Bypass (Bypass)
op('source_github').par.Bypass
Toggle - Default:
Off
Callbacks
Section titled “Callbacks” Available Callbacks:
onParseStart
onParseComplete
onFileProcessed
onIssueProcessed
onRateLimitUpdate
onError
Usage Examples
Section titled “Usage Examples”Basic Repository Parsing (Public)
Section titled “Basic Repository Parsing (Public)”1. Set 'Repository URL' (e.g., `github.com/derivative/TouchDesigner-Samples`).2. Set 'Branch/Tag' (usually `main` or `master`).3. Configure Rules (e.g., disable 'Include Code Files' if not needed).4. Pulse 'Parse Repository'.5. Monitor 'Status', 'Progress', and 'API Rate Limit'.6. Output appears in the `index_table` DAT.
Parsing with Authentication (Private or Higher Rate Limit)
Section titled “Parsing with Authentication (Private or Higher Rate Limit)”1. Enable 'Use Authentication' on the Auth page.2. Paste your GitHub Personal Access Token into 'GitHub Token'.3. Set 'Repository URL' and other parameters as needed.4. Pulse 'Parse Repository'.
Limiting Scope
Section titled “Limiting Scope”1. Adjust 'Max Issues/PRs', 'Max File Size (KB)', 'Max Directory Depth'.2. Use 'Doc File Patterns' and 'Ignore Patterns' to specifically include/exclude paths.3. Specify desired 'Code Languages' if including code.4. Pulse 'Parse Repository'.
Technical Notes
Section titled “Technical Notes”- Parsing relies on the GitHub REST API. Unauthenticated requests have very low rate limits (around 60/hour). Use a PAT.
- The process runs asynchronously via ChatTD.
- Large repositories can take considerable time to parse.
- Ensure your PAT has the necessary scopes to access the repository content (files, issues, wiki).
- The output
index_table
is formatted for direct use with the Rag Index LOP.