NationalGenomicsInfrastructure / umi-injector

umi-injector integrates UMI sequences from a separate FastQ file into the read headers of a single or paired FastQ.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UMI Injector

umi-injector integrates UMI sequences from a separate FastQ file into the read headers of a single or paired FastQ.

Run umi-injector

Essentially, umi-injector is just a bash script that wraps the tools awk, pigz and file. Unless you run the containerized version, those need to be installed on your system.

You must at least provide three arguments: --in1, --out1 and --umi (or their equivalent shorthands -1, -3 and -u).:

umi-injector.sh --in1="./test_data/read1.fastq.gz" --umi="./test_data/umi.fastq.gz" --out1="./test_data/read1_umi.fastq.gz" 
# equivalent to
umi-injector.sh -1 ./test_data/read1.fastq.gz -u ./test_data/umi.fastq.gz -3 ./test_data/read1_umi.fastq.gz

./src/umi-injector.sh --in1="./test_data/read1.fastq.gz" --umi="./test_data/umi.fastq.gz" --out1="./test_data/read1_umi.fastq.gz"

Whatever DNA sequence is provided as --umi will be integrated into the header of the reads in --in1. You can provide both, compressed and uncompressed input & output and even mix both arbitrarily.

The script will run a few basic validators on the provided arguments, but not corroborate matching read IDs. Thus, ensure that the files are sorted in a consistent order, otherwise the wrong UMIs will be integrated into the reads!

Help display of umi-injector

You can specify --sep= to choose a different UMI separator than the default colon. For umi-tools dedup, you will, for example, need to separate UMIs by an underscore.

--threads will set the number of cores used by each pigz process. For every compressed file that you read or write, this number of threads will be used.

--logfile allows you to output a small log file in JSON format. The log will comprise the number of records processed and a sample header to assess the introduced changes. At the moment, --verbose just prints the runtime to the console.

umi-injector.sh --in1=./test_data/read1.fastq.gz --in2=./test_data/read1.fastq.gz --umi=./test_data/umi.fastq.gz --out1=./test_data/read1_umi.fastq.gz --out2=./test_data/read2_umi.fastq.gz --sep="_" --logfile="./logfile.json" --threads="2" -v

./src/umi-injector.sh --in1=./test_data/read1.fastq.gz --in2=./test_data/read1.fastq.gz --umi=./test_data/umi.fastq.gz --out1=./test_data/read1_umi2.fastq.gz --out2=./test_data/read2_umi2.fastq.gz --sep="_" --logfile="./logfile.json" --threads="2" -v

To print all available options and defaults to the console, run --help or -h.

Help display of umi-injector

Building the containerized versions

Building locally

To build the image locally, Docker (e.g. Docker Desktop) needs to be installed on your system. You can build the CPU version of the image directly off this Git repository. Specify a tag name with the -t parameter.

docker build https://github.com/NationalGenomicsInfrastructure/umi-injector.git -t umi-injector

Alternatively, you can also clone this repository beforehand

git clone https://github.com/NationalGenomicsInfrastructure/umi-injector.git && cd umi-injector
docker build . --file Dockerfile -t umi-injector

Building on Github Actions

This repository also contains a Github Action workflow to build the container image.

To run the workflow successfully, you need to fork the repository and create your own repository secrets. Navigate to Settings and then to Secrets, where you need to create the two secrets DOCKERHUB_USERNAME and DOCKERHUB_TOKEN. Both will be needed by the workflow to upload the finished container image to Docker Hub.

The workflow can be dispatched manually in the Actions tab. Choose the desired settings in the dialogue and launch the workflow run.

Running containerized umi-injector

To run the containerized version of umi-injector, invoke the container like so

docker run --rm -itv $(pwd):$(pwd) -w $(pwd) umi-injector

Replace umi-injector with whatever tag you specified to the -t parameter when building the container image.

To simplify the invocation, you can also declare an alias, which can be perpetuated in your ~/.bashrc respectively ~/.zshrc.

alias umi-injector="docker run --rm -itv $(pwd):$(pwd) -w $(pwd) umi-injector"

License

The code is released under the MIT License and so are the contents of this repository. See LICENSE for further details.

About

umi-injector integrates UMI sequences from a separate FastQ file into the read headers of a single or paired FastQ.

License:MIT License


Languages

Language:Shell 96.7%Language:Dockerfile 3.3%