jairodriguez / RepoToText

Turn an entire GitHub Repo into a single organized .txt file to use with LLM's (GPT-4, Claude Opus, etc)

Home Page:https://repo-to-text-mu.vercel.app

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

example workflow example workflow

repo to text 5

repo to text 7

RepoToText

RepoToText is a web app that scrapes a GitHub repository and converts its files into a single organized .txt. It allows you to enter the URL of a GitHub repository and an optional documentation URL (the doc info will append to the top of the .txt). The app retrieves the contents of the repository, including all files and directories, and also fetches the documentation from the provided URL and includes it in a single organized text file. The .txt file will be saved in the /data folder with user + repo + timestamp info. This file can then be uploaded to (GPT-4, Claude Opus, etc) and you can use the chatbot to interact with the entire GitHub repo.

Running the Application with Docker

To run the application using Docker, follow these steps:

  1. Clone the repository.
  2. Set up the environment variable GITHUB_API_KEY in the .env file.
  3. Build the Docker images with docker compose build.
  4. Start the containers with docker compose up.
  5. Access the application (http://localhost:3000) in a web browser and enter the GitHub repository URL and documentation URL (if available).
  6. Choose All files or choose specific file types.
  7. Click the "Submit" button to initiate the scraping process. The converted text will be displayed in the output area, and it will also be saved in the /data folder.
  8. You can also click the "Copy Text" button to copy the generated text to the clipboard.

Prompt Example

This is a .txt file that represents an entire GitHub repository. The repository's individual files are separated by the sequence '''--- , followed by the file path, ending with ---. Each file's content begins immediately after its file path and extends until the next sequence of '''--- Add your idea here (Example): Please create a react front end that will work with the back end

Environment Configuration

Add your GitHub API Key in the .env file

GITHUB_API_KEY='YOUR GITHUB API KEY HERE'

FolderToText

FolderToText.py is a script that allows you to turn a local folder, or local files, into a .txt in the same way RepoToText.py does. Choose your files with browse (you can continue adding by clicking "Browse". Once you have all of your files selected and uploaded with browse, type in the file type endings you want to copy with a ',' in between. Example: .py , .js , .md , .ts ---> You can also turn this off and it will add every file you uploaded to the .txt ---> Last, enter in the file name you want to appear and the output path. The file will be written with your file name choice and a timestamp.

Info

  • Creates a .txt with ('''---) separating each file from the repo.
  • Each file from the repo has a header after ('''---) with the file path as the title.
  • The .txt file is saved in the /data folder
  • You can add a URL to a documentation page and the documentation page will append to the top of the .txt file (great to use for tech that came out after Sep 2021).

Tech Used

  • Frontend: React.js
  • Backend: Python Flask
  • Containerization: Docker
  • GitHub API: PyGithub library
  • Additional Python libraries: beautifulsoup4, requests, flask_cors, retry

TODO

  • Add Docker to project
  • FIX: Broken file types: .ipynb
  • FIX: FolderToText - fix so a user can pick one folder (currently only working when user selects individual files)
  • Add in the ability to work with private repositories
  • Create a small desktop app via PyQT or an executable file
  • Add ability to store change history and update .txt to reflect working changes
  • Add checker function to make sure .txt is current repo version
  • Adjust UI for flow, including change textarea output width, adding file management and history UI
  • Explore prompt ideas including breaking the prompts into discrete steps that nudge the model along

About

Turn an entire GitHub Repo into a single organized .txt file to use with LLM's (GPT-4, Claude Opus, etc)

https://repo-to-text-mu.vercel.app

License:MIT License


Languages

Language:Python 53.4%Language:JavaScript 28.3%Language:CSS 10.4%Language:HTML 7.9%