voynow / git2doc

Python package capable of scraping Github data at blazing fast speeds.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

git2doc πŸ“š

A powerful Python library for converting git repositories into documents. git2doc allows you to extract and analyze code from GitHub repositories, making it easier to understand and work with large codebases.

Why git2doc? πŸš€

Working with large repositories can be overwhelming, especially when trying to understand the structure and content of the code. git2doc simplifies this process by converting repositories into documents, allowing you to easily search, analyze, and understand the codebase.

Table of Contents πŸ“–

Installation πŸ’»

pip install git2doc

Usage πŸ› οΈ

Fetching Repositories

from git2doc import get_repos_orchestrator

repos = get_repos_orchestrator(
    n_repos=10,
    last_n_days=30,
    language="Python"
)

Loading Repository Data

from git2doc import pull_code_from_repo

repo_data = pull_code_from_repo(
    repo="https://github.com/voynow/git2doc",
    branch="main"
)

Writing Data to Parquet Files

from git2doc import pipeline_fetch_and_load

pipeline_fetch_and_load(
    n_repos=1000,
    last_n_days=365,
    language="Python",
    write_batch_size=100,
    delete=True,
)

Badges πŸ…

PyPI version GitHub stars GitHub forks GitHub issues

Contributing 🀝

Contributions are welcome! Please feel free to submit a pull request or open an issue on GitHub.

License πŸ“„

This project is licensed under the MIT License. See the LICENSE file for more details.

About

Python package capable of scraping Github data at blazing fast speeds.


Languages

Language:Jupyter Notebook 98.4%Language:Python 1.6%