Skip to content

Source GitHub

v1.0.0

The Source GitHub LOP parses public and private GitHub repositories via the GitHub REST API, extracting documentation files, issues, pull requests, code files, and wiki pages into a standardized index table. The output is formatted for direct use with the RAG Index LOP, making it straightforward to build knowledge bases from any GitHub repository.

🔧 GetTool Enabled 3 tools

This operator exposes 3 tools that allow Agent and Gemini Live LOPs to analyze GitHub repositories, extract issues and pull requests, and retrieve documentation files.

When connected to an Agent LOP, three tools become available:

  • analyze_github_repository — Performs a comprehensive analysis of a repository including documentation, issues, wiki, and optionally code files. Accepts a repo_url (required) and include_code boolean (optional, defaults to off).
  • get_github_issues — Extracts up to 50 issues and pull requests sorted by most recently updated. Accepts a repo_url (required) and state filter (all, open, or closed).
  • extract_github_docs — Extracts documentation files (markdown, rst, txt) and wiki pages. Accepts a repo_url (required).

All three tools operate independently from the operator’s internal tables — agent tool calls return data directly without modifying the parsed index.

  • Python package: requests (installed automatically via the shared LOPs Python environment)
  • GitHub authentication: Strongly recommended. Unauthenticated requests are limited to approximately 60 API calls per hour, which is insufficient for most repositories. Generate a Personal Access Token (PAT) on GitHub with appropriate scopes and enter it on the Auth page.

None. Repository configuration is set entirely through the parameter panel.

  • index_table — One row per parsed document containing doc_id, filename, source_path, content, metadata (JSON), and timestamp. This table is compatible with the RAG Index LOP.
  • repo_table — Tracks which files and issues have already been processed, preventing duplicate entries on re-parse.

Wire the output into a RAG Index LOP to build a searchable knowledge base from the parsed content.

  1. On the Control page, enter the repository URL in Repository URL (e.g., https://github.com/derivative/TouchDesigner-Samples).
  2. Set Branch/Tag to the branch you want to parse (e.g., main).
  3. On the Rules page, enable the content types you need: Include Documentation, Include Issues/PRs, Include Code Files, and/or Include Wiki.
  4. Back on the Control page, pulse Parse Repository.
  5. Monitor Current Status and Progress as the operator works through the repository.
  6. When complete, use the Display menu to switch between Index Table (list of all parsed documents) and Content (view individual document content with the Select Doc slider).

Authenticating for Private Repos or Higher Rate Limits

Section titled “Authenticating for Private Repos or Higher Rate Limits”
  1. On the Auth page, toggle Use Authentication to On.
  2. Paste your GitHub Personal Access Token into GitHub Token.
  3. Return to the Control page and parse as normal. The API Rate Limit field shows your remaining quota.

On the Rules page you can narrow what gets parsed:

  • Documentation: Set Doc File Patterns to target specific file types or paths (e.g., *.md docs/*). Set Ignore Patterns to skip directories like node_modules/* or tests/*. Adjust Max Directory Depth to limit how deep the parser traverses.
  • Issues/PRs: Set Max Issues to cap the number of issues fetched. Use the Issue State menu to filter by All, Open, or Closed. Toggle Include Comments to pull in issue discussion threads.
  • Code: Enter target languages in Code Languages (e.g., python javascript). Set Max File Size KB to skip large files that would bloat the index.
  1. Parse a repository as described above.
  2. Create a RAG Index LOP and wire the Source GitHub output into its input.
  3. The index table columns map directly to the RAG Index LOP’s expected format — no transformation needed.
  • Always authenticate. The unauthenticated rate limit of 60 requests per hour will be exhausted quickly on any non-trivial repository. A free GitHub PAT raises this to 5,000 per hour.
  • Start with documentation only. Enable Include Documentation first and leave code and issues off until you confirm the repository parses correctly. Code parsing on large repos can consume significant API calls.
  • Use ignore patterns. Exclude common noise directories like node_modules/*, .git/*, vendor/*, and tests/* to keep the index focused.
  • Monitor the rate limit. The API Rate Limit field on the Control page shows remaining calls. If the operator hits the limit mid-parse, it will pause and report a status message with the reset time.
  • Pulse Stop if needed. Long-running parses can be halted with Stop Processing. Already-parsed content is retained in the index table.
  • “Invalid GitHub repository URL” — The operator expects a URL containing github.com/owner/repo. Ensure the URL is complete and correctly formatted.
  • Rate limit exceeded immediately — You are likely unauthenticated. Enable authentication on the Auth page with a valid PAT.
  • No content parsed — Check that at least one content type toggle is enabled on the Rules page. Also verify that the Branch/Tag value matches an actual branch in the repository.
  • Large repository stalls — Very large repos with thousands of files may take considerable time. Use Max Directory Depth, Max File Size KB, and Ignore Patterns to reduce scope. Monitor Progress and Current Status for activity.
  • Wiki not found — Not all repositories have wikis enabled. The operator logs a message and continues if the wiki is inaccessible.
Repository URL (Repourl) op('source_github').par.Repourl Str
Default:
"" (Empty String)
Branch/Tag (Branch) op('source_github').par.Branch Str
Default:
"" (Empty String)
Current Status (Status) op('source_github').par.Status Str
Default:
"" (Empty String)
Progress (Progress) op('source_github').par.Progress Float
Default:
0.0
Range:
0 to 1
Slider Range:
0 to 100
Caution: Exposing the viewer of large index tables will be heavy Header
Parse Repository (Parse) op('source_github').par.Parse Pulse
Default:
False
Stop Processing (Stop) op('source_github').par.Stop Pulse
Default:
False
API Rate Limit (Ratelimit) op('source_github').par.Ratelimit Str
Default:
"" (Empty String)
Clear All (Clear) op('source_github').par.Clear Pulse
Default:
False
Display (Display) op('source_github').par.Display Menu
Default:
index
Options:
index, content
Display File (Displayfile) op('source_github').par.Displayfile Str
Default:
"" (Empty String)
Select Doc (Selectdoc) op('source_github').par.Selectdoc Int
Default:
1
Range:
0 to 1
Slider Range:
1 to 0
Header
Include Wiki (Includewiki) op('source_github').par.Includewiki Toggle
Default:
False
Include Code Files (Includecode) op('source_github').par.Includecode Toggle
Default:
False
Code Languages (Codelanguages) op('source_github').par.Codelanguages Str
Default:
"" (Empty String)
Include Context (Includecontext) op('source_github').par.Includecontext Toggle
Default:
False
Max File Size KB (Maxfilesize) op('source_github').par.Maxfilesize Int
Default:
0
Range:
0 to 1
Slider Range:
0 to 1
Include Documentation (Includedocs) op('source_github').par.Includedocs Toggle
Default:
False
Doc File Patterns (Docpatterns) op('source_github').par.Docpatterns Str
Default:
"" (Empty String)
Ignore Patterns (Ignorepaths) op('source_github').par.Ignorepaths Str
Default:
"" (Empty String)
Max Directory Depth (Maxdepth) op('source_github').par.Maxdepth Int
Default:
0
Range:
0 to 1
Slider Range:
0 to 1
Include Issues/PRs (Includeissues) op('source_github').par.Includeissues Toggle
Default:
False
Max Issues (Issuelimit) op('source_github').par.Issuelimit Int
Default:
0
Range:
0 to 1
Slider Range:
0 to 1
Issue State (Issuestate) op('source_github').par.Issuestate Menu
Default:
all
Options:
all, open, closed
Include Comments (Includecomments) op('source_github').par.Includecomments Toggle
Default:
False
Use Authentication (Useauth) op('source_github').par.Useauth Toggle
Default:
False
GitHub Token (Token) op('source_github').par.Token Str
Default:
"" (Empty String)
v1.0.02024-11-06

Initial release