Source GitHub
The Source GitHub LOP parses public and private GitHub repositories via the GitHub REST API, extracting documentation files, issues, pull requests, code files, and wiki pages into a standardized index table. The output is formatted for direct use with the RAG Index LOP, making it straightforward to build knowledge bases from any GitHub repository.
Agent Tool Integration
Section titled “Agent Tool Integration”This operator exposes 3 tools that allow Agent and Gemini Live LOPs to analyze GitHub repositories, extract issues and pull requests, and retrieve documentation files.
Use the Tool Debugger operator to inspect exact tool definitions, schemas, and parameters.
When connected to an Agent LOP, three tools become available:
- analyze_github_repository — Performs a comprehensive analysis of a repository including documentation, issues, wiki, and optionally code files. Accepts a
repo_url(required) andinclude_codeboolean (optional, defaults to off). - get_github_issues — Extracts up to 50 issues and pull requests sorted by most recently updated. Accepts a
repo_url(required) andstatefilter (all,open, orclosed). - extract_github_docs — Extracts documentation files (markdown, rst, txt) and wiki pages. Accepts a
repo_url(required).
All three tools operate independently from the operator’s internal tables — agent tool calls return data directly without modifying the parsed index.
Requirements
Section titled “Requirements”- Python package:
requests(installed automatically via the shared LOPs Python environment) - GitHub authentication: Strongly recommended. Unauthenticated requests are limited to approximately 60 API calls per hour, which is insufficient for most repositories. Generate a Personal Access Token (PAT) on GitHub with appropriate scopes and enter it on the Auth page.
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”None. Repository configuration is set entirely through the parameter panel.
Outputs
Section titled “Outputs”- index_table — One row per parsed document containing
doc_id,filename,source_path,content,metadata(JSON), andtimestamp. This table is compatible with the RAG Index LOP. - repo_table — Tracks which files and issues have already been processed, preventing duplicate entries on re-parse.
Wire the output into a RAG Index LOP to build a searchable knowledge base from the parsed content.
Usage Examples
Section titled “Usage Examples”Parsing a Public Repository
Section titled “Parsing a Public Repository”- On the Control page, enter the repository URL in
Repository URL(e.g.,https://github.com/derivative/TouchDesigner-Samples). - Set
Branch/Tagto the branch you want to parse (e.g.,main). - On the Rules page, enable the content types you need:
Include Documentation,Include Issues/PRs,Include Code Files, and/orInclude Wiki. - Back on the Control page, pulse
Parse Repository. - Monitor
Current StatusandProgressas the operator works through the repository. - When complete, use the
Displaymenu to switch betweenIndex Table(list of all parsed documents) andContent(view individual document content with theSelect Docslider).
Authenticating for Private Repos or Higher Rate Limits
Section titled “Authenticating for Private Repos or Higher Rate Limits”- On the Auth page, toggle
Use Authenticationto On. - Paste your GitHub Personal Access Token into
GitHub Token. - Return to the Control page and parse as normal. The
API Rate Limitfield shows your remaining quota.
Filtering Content Scope
Section titled “Filtering Content Scope”On the Rules page you can narrow what gets parsed:
- Documentation: Set
Doc File Patternsto target specific file types or paths (e.g.,*.md docs/*). SetIgnore Patternsto skip directories likenode_modules/*ortests/*. AdjustMax Directory Depthto limit how deep the parser traverses. - Issues/PRs: Set
Max Issuesto cap the number of issues fetched. Use theIssue Statemenu to filter byAll,Open, orClosed. ToggleInclude Commentsto pull in issue discussion threads. - Code: Enter target languages in
Code Languages(e.g.,python javascript). SetMax File Size KBto skip large files that would bloat the index.
Building a RAG Knowledge Base
Section titled “Building a RAG Knowledge Base”- Parse a repository as described above.
- Create a RAG Index LOP and wire the Source GitHub output into its input.
- The index table columns map directly to the RAG Index LOP’s expected format — no transformation needed.
Best Practices
Section titled “Best Practices”- Always authenticate. The unauthenticated rate limit of 60 requests per hour will be exhausted quickly on any non-trivial repository. A free GitHub PAT raises this to 5,000 per hour.
- Start with documentation only. Enable
Include Documentationfirst and leave code and issues off until you confirm the repository parses correctly. Code parsing on large repos can consume significant API calls. - Use ignore patterns. Exclude common noise directories like
node_modules/*,.git/*,vendor/*, andtests/*to keep the index focused. - Monitor the rate limit. The
API Rate Limitfield on the Control page shows remaining calls. If the operator hits the limit mid-parse, it will pause and report a status message with the reset time. - Pulse Stop if needed. Long-running parses can be halted with
Stop Processing. Already-parsed content is retained in the index table.
Troubleshooting
Section titled “Troubleshooting”- “Invalid GitHub repository URL” — The operator expects a URL containing
github.com/owner/repo. Ensure the URL is complete and correctly formatted. - Rate limit exceeded immediately — You are likely unauthenticated. Enable authentication on the Auth page with a valid PAT.
- No content parsed — Check that at least one content type toggle is enabled on the Rules page. Also verify that the
Branch/Tagvalue matches an actual branch in the repository. - Large repository stalls — Very large repos with thousands of files may take considerable time. Use
Max Directory Depth,Max File Size KB, andIgnore Patternsto reduce scope. MonitorProgressandCurrent Statusfor activity. - Wiki not found — Not all repositories have wikis enabled. The operator logs a message and continues if the wiki is inaccessible.
Parameters
Section titled “Parameters”Control
Section titled “Control”op('source_github').par.Repourl Str - Default:
"" (Empty String)
op('source_github').par.Branch Str - Default:
"" (Empty String)
op('source_github').par.Status Str - Default:
"" (Empty String)
op('source_github').par.Progress Float - Default:
0.0- Range:
- 0 to 1
- Slider Range:
- 0 to 100
op('source_github').par.Parse Pulse - Default:
False
op('source_github').par.Stop Pulse - Default:
False
op('source_github').par.Ratelimit Str - Default:
"" (Empty String)
op('source_github').par.Clear Pulse - Default:
False
op('source_github').par.Displayfile Str - Default:
"" (Empty String)
op('source_github').par.Selectdoc Int - Default:
1- Range:
- 0 to 1
- Slider Range:
- 1 to 0
op('source_github').par.Includewiki Toggle - Default:
False
op('source_github').par.Includecode Toggle - Default:
False
op('source_github').par.Codelanguages Str - Default:
"" (Empty String)
op('source_github').par.Includecontext Toggle - Default:
False
op('source_github').par.Maxfilesize Int - Default:
0- Range:
- 0 to 1
- Slider Range:
- 0 to 1
op('source_github').par.Includedocs Toggle - Default:
False
op('source_github').par.Docpatterns Str - Default:
"" (Empty String)
op('source_github').par.Ignorepaths Str - Default:
"" (Empty String)
op('source_github').par.Maxdepth Int - Default:
0- Range:
- 0 to 1
- Slider Range:
- 0 to 1
op('source_github').par.Includeissues Toggle - Default:
False
op('source_github').par.Issuelimit Int - Default:
0- Range:
- 0 to 1
- Slider Range:
- 0 to 1
op('source_github').par.Includecomments Toggle - Default:
False
op('source_github').par.Useauth Toggle - Default:
False
op('source_github').par.Token Str - Default:
"" (Empty String)
Changelog
Section titled “Changelog”v1.0.02024-11-06
Initial release