Source Github Operator
The Source Github LOP (formerly GitHubParser) is designed to ingest and parse content from public or private GitHub repositories, transforming it into a structured DAT table suitable for use with the Rag Index LOP. It supports parsing documentation files (.md, .rst), issues, pull requests, code files (specific languages), and wikis. This allows you to create a comprehensive knowledge base from a GitHub repository for RAG applications.
Note: Requires the requests Python library. Authentication is recommended to avoid strict GitHub API rate limits.
Parameters
Section titled “Parameters”Parameters are organized into pages.
Repository URL (Repourl)
op('source_github').par.Repourl Str - Default:
None
Branch/Tag (Branch)
op('source_github').par.Branch Str - Default:
None
Parse Repository (Parse)
op('source_github').par.Parse Pulse - Default:
None
Stop Processing (Stop)
op('source_github').par.Stop Pulse - Default:
None
Current Status (Status)
op('source_github').par.Status Str - Default:
None
Progress (Progress)
op('source_github').par.Progress Float - Default:
None
API Rate Limit (Ratelimit)
op('source_github').par.Ratelimit Str - Default:
None
Clear All (Clear)
op('source_github').par.Clear Pulse - Default:
None
Caution: Exposing the viewer of large index tables will be heavy Header
Select Doc (Selectdoc)
op('source_github').par.Selectdoc Int - Default:
1- Range:
- 1 to N/A
- Slider Range:
- 1 to N/A
Display File (Displayfile)
op('source_github').par.Displayfile Str - Default:
None
Include Documentation (Includedocs)
op('source_github').par.Includedocs Toggle - Default:
Off
Doc File Patterns (Docpatterns)
op('source_github').par.Docpatterns Str - Default:
None
Include Wiki (Includewiki)
op('source_github').par.Includewiki Toggle - Default:
Off
Include Issues/PRs (Includeissues)
op('source_github').par.Includeissues Toggle - Default:
Off
Max Issues/PRs (Issuelimit)
op('source_github').par.Issuelimit Int - Default:
0- Range:
- 0 to 1000
Include Comments (Includecomments)
op('source_github').par.Includecomments Toggle - Default:
Off
Include Code Files (Includecode)
op('source_github').par.Includecode Toggle - Default:
Off
Code Languages (Codelanguages)
op('source_github').par.Codelanguages Str - Default:
None
Include Code Context (Includecontext)
op('source_github').par.Includecontext Toggle - Default:
Off
Max File Size (KB) (Maxfilesize)
op('source_github').par.Maxfilesize Int - Default:
0- Range:
- 0 to 10000
Ignore Patterns (Ignorepaths)
op('source_github').par.Ignorepaths Str - Default:
None
Max Directory Depth (Maxdepth)
op('source_github').par.Maxdepth Int - Default:
0- Range:
- 0 to 50
Use Authentication (Useauth)
op('source_github').par.Useauth Toggle - Default:
Off
GitHub Token (Token)
op('source_github').par.Token Str - Default:
None
ChatTD (Chattd)
op('source_github').par.Chattd OP - Default:
None
Show Built In Pars (Showbuiltin)
op('source_github').par.Showbuiltin Toggle - Default:
Off
Bypass (Bypass)
op('source_github').par.Bypass Toggle - Default:
Off
Version (Version)
op('source_github').par.Version Str - Default:
None
Lastupdated (Lastupdated)
op('source_github').par.Lastupdated Str - Default:
None
Creator (Creator)
op('source_github').par.Creator Str - Default:
None
Website (Website)
op('source_github').par.Website Str - Default:
None
Callbacks
Section titled “Callbacks” Available Callbacks:
onParseStartonParseCompleteonFileProcessedonIssueProcessedonRateLimitUpdateonError
Usage Examples
Section titled “Usage Examples”Basic Repository Parsing (Public)
Section titled “Basic Repository Parsing (Public)”- Set ‘Repository URL’ (e.g.,
github.com/derivative/TouchDesigner-Samples). - Set ‘Branch/Tag’ (usually
mainormaster). - Configure Rules (e.g., disable ‘Include Code Files’ if not needed).
- Pulse ‘Parse Repository’.
- Monitor ‘Status’, ‘Progress’, and ‘API Rate Limit’.
- Output appears in the
index_tableDAT.
Parsing with Authentication (Private or Higher Rate Limit)
Section titled “Parsing with Authentication (Private or Higher Rate Limit)”- Enable ‘Use Authentication’ on the Auth page.
- Paste your GitHub Personal Access Token into ‘GitHub Token’.
- Set ‘Repository URL’ and other parameters as needed.
- Pulse ‘Parse Repository’.
Limiting Scope
Section titled “Limiting Scope”- Adjust ‘Max Issues/PRs’, ‘Max File Size (KB)’, ‘Max Directory Depth’.
- Use ‘Doc File Patterns’ and ‘Ignore Patterns’ to specifically include/exclude paths.
- Specify desired ‘Code Languages’ if including code.
- Pulse ‘Parse Repository’.
Technical Notes
Section titled “Technical Notes”- Parsing relies on the GitHub REST API. Unauthenticated requests have very low rate limits (around 60/hour). Use a PAT.
- The process runs asynchronously via ChatTD.
- Large repositories can take considerable time to parse.
- Ensure your PAT has the necessary scopes to access the repository content (files, issues, wiki).
- The output
index_tableis formatted for direct use with the Rag Index LOP.