svalkiers / clusTCR

CDR3 clustering module providing a new method for fast and accurate clustering of large data sets of CDR3 amino acid sequences, and offering functionalities for downstream analysis of clustering results.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity Build Status conda

A two-step clustering approach that combines the speed of the Faiss Clustering Library with the accuracy of Markov Clustering Algorithm

On a standard machine*, clusTCR can cluster 1 million CDR3 sequences in under 5 minutes.
*Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz, using 8 CPUs

Compared to other state-of-the-art clustering algorithms (GLIPH2, iSMART and tcrdist), clusTCR shows comparable clustering quality, but provides a steep increase in speed and scalability.

drawing

Documentation & Install

All of our documentation, installation info and examples can be found in the above link! To get you started, here's how to install clusTCR

$ conda install clustcr -c svalkiers -c bioconda -c pytorch -c conda-forge

There's also a GPU version available, with support for the use_gpu parameter in the Clustering interface.

$ conda install clustcr-gpu -c svalkiers -c bioconda -c pytorch -c conda-forge

Mind that this is for specific GPUs only, see our docs for more information.

To update use a similar command

$ conda update clustcr -c svalkiers -c bioconda -c pytorch -c conda-forge

Development Guide

Environment

To start developing, after cloning the repository, create the necessary environment

$ conda env create -f conda/env.yml

The requirements are slightly different for the GPU supported version

$ conda env create -f conda/env_gpu.yml

Building Packages

To build a new conda package, conda build is used.
Mind that the correct channels (pytorch, bioconda & conda-forge) should be added first or be incorporated in the commands as can be seen in the install commands above.

$ conda build conda/clustcr/

For the GPU package:

$ conda build conda/clustcr-gpu/

Cite

Please cite as:

Sebastiaan Valkiers, Max Van Houcke, Kris Laukens, Pieter Meysman, ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity, Bioinformatics, 2021;, btab446, https://doi.org/10.1093/bioinformatics/btab446

Bibtex:

@article{valkiers2021clustcr,
    author = {Valkiers, Sebastiaan and Van Houcke, Max and Laukens, Kris and Meysman, Pieter},
    title = "{ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity}",
    journal = {Bioinformatics},
    year = {2021},
    month = {06},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btab446},
    url = {https://doi.org/10.1093/bioinformatics/btab446},
    note = {btab446},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btab446/38660282/btab446.pdf},
}

About

CDR3 clustering module providing a new method for fast and accurate clustering of large data sets of CDR3 amino acid sequences, and offering functionalities for downstream analysis of clustering results.

License:Other


Languages

Language:Python 99.9%Language:R 0.0%Language:Dockerfile 0.0%