zarybnicky / codestats

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Codestats

With an existing DuckDB:

  1. Extract the database to data/git.duckdb

    mkdir data/
    mv ~/Downloads/git.duckdb.gz data/
    gunzip data/git.duckdb.gz
  2. Install dependencies and start Streamlit

    poetry install
    poetry run streamlit run Recent.py

    OR if you don't want to install Python, there's a Dockerfile and/or a docker-compose.yml too, although it seems to run somewhat slower, perhaps due to the default cgroups limits? DuckDB seems rather hungry for resources.

    docker compose build
    docker compose up

Without an existing DB

  1. Update your .ssh config - I use this config for multiplexing to speed up cloning

    Host redmine-git
      User git
      HostName redmine.mgmtprod
      Port 2223
      IdentityFile ~/.ssh/id_rsa
      ControlPath ~/.ssh/connections/%r@%h.ctl
      ControlMaster auto
      ControlPersist 10m
      IdentitiesOnly yes
  2. Create an .env in this directory

    export GITLAB_HOST=gitlab.mgmtprod
    export GITLAB_USER=<username>
    export GITLAB_TOKEN=<personal access token>
    export GITLAB_ROOT="${HOME}/repos/gitlab"
    
    export GITOLITE_HOST="redmine-git"
    export GITOLITE_ROOT="${HOME}/repos/gitolite"

    If necessary, create the GitLab personal access token first.

  3. The indexing process happens in four steps:

    • repository discovery (poetry run python discover_gitlab.py and discover_gitolite.py)
      • this produces data/repos-*.csv
    • cloning (or fetching) the repositories (fetch_known_repos.py)
      • this produces bare repositories in GITLAB_ROOT and GITOLITE_ROOT
      • I have, in the past, used git worktree to work with bare repos locally too
        • git -C ~/repos/gitlab/odoo/odoo.git worktree add ~/work/odoo main
        • git -C ~/work/odoo commit
        • rm -rf ~/work/odoo
        • git -C ~/repos/gitlab/odoo/odoo.git worktree prune
    • indexing the repositories by parsing the output of git ls-tree and git log --numstat
      • produces data/git_*.csv
    • and lastly loading the CSVs into a DuckDB database
      • produces data/git.duckdb
      • to be compressed into data/git.duckdb.gz using gzip -k data/git.duckdb
      • originally, this project ran on PostgreSQL
      • but DuckDB is useful for a workshop format, and for sharing the DB index in general

About


Languages

Language:Python 98.4%Language:Dockerfile 1.6%