AleksanderGondek / josh

Just One Single History

Home Page:https://josh-project.github.io/josh/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Just One Single History

Combine the advantages of a monorepo with those of multirepo setups by leveraging a blazingly-fast, incremental, and reversible implementation of git history filtering.

josh-proxy can be integrated with any git host:

$ docker run -p 8000:8000 -e JOSH_REMOTE=https://github.com -v josh-vol:/data/git joshproject/josh-proxy:latest

See Container options for full list of environment variables.

Use cases

Partial cloning

Reduce scope and size of clones by treating subdirectories of the monorepo as individual repositories.

$ git clone http://josh/monorepo.git:/path/to/library.git

The partial repo will act as a normal git repository but only contain the files found in the subdirectory and only commits affecting those files. The partial repo supports both fetch as well as push operation.

This helps not just to improve performance on the client due to having fewer files in the tree, it also enables collaboration on parts of the monorepo with other parties utilizing git's normal distributed development features. For example, this makes it easy to mirror just selected parts of your repo to public github repositories or specific customers.

Project composition / Workspaces

Simplify code sharing and dependency management. Beyond just subdirectories, Josh supports filtering, re-mapping and composition of arbitrary virtual repositories from the content found in the monorepo.

The mapping itself is also stored in the repository and therefore versioned alongside the code.

Central monorepo Project workspaces workspace.josh file
Folders and files in central.git Folders and files in project1.git
dependencies = :/modules:[
    ::tools/
    ::library1/
]
Folders and files in project2.git
libs/library1 = :/modules/library1

Workspaces act as normal git repos:

$ git clone http://josh/central.git:workspace=workspaces/project1.git

Simplified CI/CD

With everything stored in one repo, CI/CD systems only need to look into one source for each particular deliverable. However in traditional monorepo environments dependency mangement is handled by the build system. Build systems are usually tailored to specific languages and need their input already checked out on the filesystem. So the question:

"What deliverables are affected by a given commit and need to be rebuild?"

cannot be answered without cloning the entire repository and understanding how the languages used handle dependencies.

In particular when using C familiy languages, hidden dependencies on header files are easy to miss. For this reason limiting the visibility of files to the compiler by sandboxing is pretty much a requirement for reproducible builds.

With Josh, each deliverable gets it's own virtual git repository with dependencies declared in the workspace.josh file. This means answering the above question becomes as simple as comparing commit ids. Furthermore due to the tree filtering each build is guaranteed to be perfectly sandboxed and only sees those parts of the monorepo that have actually been mapped.

This also means the deliverables to be re-build can be determined without cloning any repos like typically necessary with normal build tools.

GraphQL API

It is often desireable to access content stored in git without requiring a clone of the repository. This is usefull for CI/CD systems or web frontends such as dashboards.

Josh exposes a GraphQL API for that purpose. For example, it can be used to find all workspaces currently present in the tree:

query {
  rev(at:"refs/heads/master", filter:"::**/workspace.josh") {
    files { path }
  }
}

Caching proxy

Even without using the more advanced features like partial cloning or workspaces, josh-proxy can act as a cache to reduce traffic between locations or keep your CI from performing many requests to the main git host.

FAQ

See here

Configuration

Container options

Variable Meaning
JOSH_REMOTE HTTP remote, including protocol. Example: https://github.com
JOSH_REMOTE_SSH SSH remote, including protocol. Example: ssh://git@github.com
JOSH_HTTP_PORT HTTP port to listen on. Default: 8000
JOSH_SSH_PORT SSH port to listen on. Default: 8022
JOSH_SSH_MAX_STARTUPS Maximum number of concurrent SSH authentication attempts. Default: 16
JOSH_SSH_TIMEOUT Timeout, in seconds, for a single request when serving repos over SSH. This time should cover fetch from upstream repo, filtering, and serving repo to client. Default: 300
JOSH_EXTRA_OPTS Extra options passed directly to josh-proxy process

Container volumes

Volume Purpose
/data/git Git cache volume. If this volume is not mounted, the cache will be lost every time the container is shut down.
/data/keys SSH server keys. If this volume is not mounted, a new key will be generated on each container startup

Configuring SSH access

Josh supports SSH access (just pull without pushing, for now). To use SSH, you need to add the following lines to your ~/.ssh/config:

Host your-josh-instance.com
    ForwardAgent yes
    PreferredAuthentications publickey

Alternatively, you can pass those options via GIT_SSH_COMMAND:

GIT_SSH_COMMAND="ssh -o PreferredAuthentications=publickey -o ForwardAgent=yes" git clone ssh://git@your-josh-instance.com/...

In other words, you need to ensure SSH agent forwarding is enabled.

About

Just One Single History

https://josh-project.github.io/josh/

License:MIT License


Languages

Language:Rust 83.7%Language:TypeScript 9.9%Language:SCSS 1.8%Language:Shell 1.4%Language:Dockerfile 1.3%Language:Go 0.7%Language:Python 0.6%Language:HTML 0.3%Language:Nix 0.2%Language:Makefile 0.1%Language:CSS 0.0%