Source Github Operator
The Source Github LOP (formerly GitHubParser) is designed to ingest and parse content from public or private GitHub repositories, transforming it into a structured DAT table suitable for use with the Rag Index LOP. It supports parsing documentation files (.md
, .rst
), issues, pull requests, code files (specific languages), and wikis. This allows you to create a comprehensive knowledge base from a GitHub repository for RAG applications.
Note: Requires the requests
Python library. Authentication is recommended to avoid strict GitHub API rate limits.
Parameters
Section titled “Parameters”Parameters are organized into pages.
Repository URL (Repourl)
op('source_github').par.Repourl
Str - Default:
None
Branch/Tag (Branch)
op('source_github').par.Branch
Str - Default:
None
Parse Repository (Parse)
op('source_github').par.Parse
Pulse - Default:
None
Stop Processing (Stop)
op('source_github').par.Stop
Pulse - Default:
None
Current Status (Status)
op('source_github').par.Status
Str - Default:
None
Progress (Progress)
op('source_github').par.Progress
Float - Default:
None
API Rate Limit (Ratelimit)
op('source_github').par.Ratelimit
Str - Default:
None
Clear All (Clear)
op('source_github').par.Clear
Pulse - Default:
None
Caution: Exposing the viewer of large index tables will be heavy Header
Select Doc (Selectdoc)
op('source_github').par.Selectdoc
Int - Default:
1
- Range:
- 1 to N/A
- Slider Range:
- 1 to N/A
Display File (Displayfile)
op('source_github').par.Displayfile
Str - Default:
None
Include Documentation (Includedocs)
op('source_github').par.Includedocs
Toggle - Default:
Off
Doc File Patterns (Docpatterns)
op('source_github').par.Docpatterns
Str - Default:
None
Include Wiki (Includewiki)
op('source_github').par.Includewiki
Toggle - Default:
Off
Include Issues/PRs (Includeissues)
op('source_github').par.Includeissues
Toggle - Default:
Off
Max Issues/PRs (Issuelimit)
op('source_github').par.Issuelimit
Int - Default:
0
- Range:
- 0 to 1000
Include Comments (Includecomments)
op('source_github').par.Includecomments
Toggle - Default:
Off
Include Code Files (Includecode)
op('source_github').par.Includecode
Toggle - Default:
Off
Code Languages (Codelanguages)
op('source_github').par.Codelanguages
Str - Default:
None
Include Code Context (Includecontext)
op('source_github').par.Includecontext
Toggle - Default:
Off
Max File Size (KB) (Maxfilesize)
op('source_github').par.Maxfilesize
Int - Default:
0
- Range:
- 0 to 10000
Ignore Patterns (Ignorepaths)
op('source_github').par.Ignorepaths
Str - Default:
None
Max Directory Depth (Maxdepth)
op('source_github').par.Maxdepth
Int - Default:
0
- Range:
- 0 to 50
Use Authentication (Useauth)
op('source_github').par.Useauth
Toggle - Default:
Off
GitHub Token (Token)
op('source_github').par.Token
Str - Default:
None
ChatTD (Chattd)
op('source_github').par.Chattd
OP - Default:
None
Show Built In Pars (Showbuiltin)
op('source_github').par.Showbuiltin
Toggle - Default:
Off
Bypass (Bypass)
op('source_github').par.Bypass
Toggle - Default:
Off
Version (Version)
op('source_github').par.Version
Str - Default:
None
Lastupdated (Lastupdated)
op('source_github').par.Lastupdated
Str - Default:
None
Creator (Creator)
op('source_github').par.Creator
Str - Default:
None
Website (Website)
op('source_github').par.Website
Str - Default:
None
Callbacks
Section titled “Callbacks” Available Callbacks:
onParseStart
onParseComplete
onFileProcessed
onIssueProcessed
onRateLimitUpdate
onError
Usage Examples
Section titled “Usage Examples”Basic Repository Parsing (Public)
Section titled “Basic Repository Parsing (Public)”- Set ‘Repository URL’ (e.g.,
github.com/derivative/TouchDesigner-Samples
). - Set ‘Branch/Tag’ (usually
main
ormaster
). - Configure Rules (e.g., disable ‘Include Code Files’ if not needed).
- Pulse ‘Parse Repository’.
- Monitor ‘Status’, ‘Progress’, and ‘API Rate Limit’.
- Output appears in the
index_table
DAT.
Parsing with Authentication (Private or Higher Rate Limit)
Section titled “Parsing with Authentication (Private or Higher Rate Limit)”- Enable ‘Use Authentication’ on the Auth page.
- Paste your GitHub Personal Access Token into ‘GitHub Token’.
- Set ‘Repository URL’ and other parameters as needed.
- Pulse ‘Parse Repository’.
Limiting Scope
Section titled “Limiting Scope”- Adjust ‘Max Issues/PRs’, ‘Max File Size (KB)’, ‘Max Directory Depth’.
- Use ‘Doc File Patterns’ and ‘Ignore Patterns’ to specifically include/exclude paths.
- Specify desired ‘Code Languages’ if including code.
- Pulse ‘Parse Repository’.
Technical Notes
Section titled “Technical Notes”- Parsing relies on the GitHub REST API. Unauthenticated requests have very low rate limits (around 60/hour). Use a PAT.
- The process runs asynchronously via ChatTD.
- Large repositories can take considerable time to parse.
- Ensure your PAT has the necessary scopes to access the repository content (files, issues, wiki).
- The output
index_table
is formatted for direct use with the Rag Index LOP.