HumanisingAutonomy / versioned_collection

MongoDB collection versioning and tracking

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Tests Docs Coverage

Enable collection versioning in MongoDB with VersionedCollection. A VersionedCollection can be used like a normal pymongo Collection, but it also supports git-like functionality, such as registering versions, branching and synchronising two collections of the same type by pulling and pushing data between them.

Installation

MongoDB

Firstly, make sure you install MongoDB>=5.0, and enable replica sets with a single replica instance.

Versioned Collection
pip install versioned_collection
Dependencies

To solve the merge conflicts after pulling data from the remote collection, this library uses the Meld mergetool. To install Meld, run:

sudo apt update
sudo apt install -y meld

or manually download and install the package from the Meld website.

Basic example

Expand

To enable versioning on a collection, firstly create a class that inherits from VersionedCollection. All the interactions with the collection should be done through this class, and not by using the database directly with mongosh or other database management or querying programs, and also not by directly accessing the collection using the pymongo driver.

import pymongo
from versioned_collection import VersionedCollection


client = pymongo.MongoClient("mongodb://localhost:27017")
db = client['database_name']
bands_collection = VersionedCollection(db, name='bands')

bands_collection.insert_one({'name': 'Led Zeppelin'})
bands_collection.init('Initial collection version')

bands_collection.insert_one({'name': 'Black Sabbath'})
bands_collection.register(message='Second version')

Note

Access control If access control is enabled, the username and the password of a user that has `readWrite` permissions to the database where the target collection is located should be provided:
host, port, user, password = get_params()

client = pymongo.MongoClient(
    host=host,
    port=port,
    username=user,
    password=password,
)

db = client['database_name']
bands_collection = VersionedCollection(
    db, 'bands', 
    username=user, 
    password=password
)

Common operations

versioned_collection git remarks
register commit Registering a version of a collection is equivalent to committing the changes
checkout checkout
create_branch branch Create a new branch. Branches in versioned_collection are just pointers to a registered version, as branches in git are just pointers to commits.
stash stash Stashes the changes.
stash apply stash apply Applies the stashed changes. The versioned_collection differs from the git one, and overwrites the new state of the collection with the stashed changes (does not perform a merge).
stash discard stash drop Clears the stashed changes.
delete_version_subtree reset --hard <hash> Removes a version and all the subsequent registered versions.
discard_changes git reset --hard && git clean -fxd Removes all of the unregistered changes.
diff diff Computes the diffs between the current version and another version.
log log Inspect the version log similarly as the commit log can be viewed.
pull pull Pulls the changes from a remote collection to the local collection.
push push Pushes the changes from the local collection to a remote collection.

Using the CLI

Expand

After installing this library, refresh the shell or open a new one. A CLI should have been installed and can be accessed by typing vc.

usage: vc [-h] command ...

optional arguments:
  -h, --help       show this help message and exit

These are common VersionedCollection commands:
  commands
    config                Update the configuration and credentials
    use                   Set the database and the collection to use
    status                Show the status of the version tree
    init                  Initialise a collection for versioning
    create_branch         Create a new branch pointing at the current version
    register              Register a new version of the collection
    checkout              Check out a tracked version of the collection
    log                   Show version logs
    branches              Show the existing branches of the collection
    diff                  Compute the diff between the current version and another version
    discard_changes       Discard the unregistered changes of the collection
    stash                 Stash the changes of the collection. See subcommand for help
    delete_version        Delete a version and all the successor versions of it
    push                  Update remote collection by uploading a branch to it
    pull                  Fetch from and integrate a branch from a remote collection
    resolve_conflicts     Resolve the merge conflicts
    listen                Start monitoring the changes made to the collection.

Firstly, make sure you run vc config to configure the connection details to the mongo server. See vc config -h for the list of parameters.

usage: vc config [-h] [--local | --remote] [--username USERNAME] [--password [PASSWORD]] [--host HOST] [--port PORT] commands ...

optional arguments:
  -h, --help              show this help message and exit
  --local                 whether to set the configuration for the local database
  --remote                whether to set the configuration for the remote database
  --username USERNAME     user with access to the database
  --password [PASSWORD]   password to access the database. if unfilled, a prompt will appear.
  --host HOST             host address of the mongodb server
  --port PORT             port of the mongodb server

The available subcommands:
  commands
    show                  Print the contents of the current configurationon

Use the flags --local and --remote to update the database connection information for the local or the remote collection. If no flag is passed, by default the configuration for the local database is updated. Note that the 'remote' can be on the same host, but in other database.

To perform versioning operations on a versioned collection make sure you select it by using vc use.

usage: vc use [-h] -d DATABASE -c COLLECTION

optional arguments:
  -h, --help            show this help message and exit
  -d DATABASE, --database DATABASE
                        Database containing the versioned collection
  -c COLLECTION, --collection COLLECTION
                        Name of the versioned collection

Building the documentation locally

Expand

To build documentation in various formats, you will need Sphinx:

nox -s docs

This will build the documentation in html format. If other formats are preferred, run

nox -s "docs(docs_format='<format>')"

Additional resources

For more comprehensive examples check the following resources:

About

MongoDB collection versioning and tracking

License:Apache License 2.0


Languages

Language:Python 99.8%Language:Shell 0.2%