This folder is associated with Sars-CoV-2 variant calling and annotation. It can be downloaded as a semi-standalone to perform analyses on a distributed cluster, based on module
package software module manager, installed with the required modules. It was tested on the ILRI cluster and its functioning cannot be garanteed elsewhere, especially, without adjusting or setting variables accordingly. It is still under active development.
The pipeline script was created based on gkarthik's iVar works. It entails variant calling all the way to annotation. It is still in active development.
SLURM
scheduler script: for running the variant calling script (sbatch -w <computeNode> slurm.sbatch
) from pwd
. You will have to modify the email field approriately.
- Adapter
.fa
files for adapter-trimming.
- ivar_variants_to_vcf.py: two copies, one (
*_s_*
) of which is modified to filter out only the Spike gene variants. - ncov19vc.sh: variant calling script based on
slurm job scheduler
. Be sure to modify read file suffixes and input/output directories accordingly. Outputs of the script are dumped in thepwd
while script is running, but moved to respective output directories at the end of every sample processing cycle. Some outputs of interest are copied to separate directories for consolidation, especially for downstream analyses and reporting on local client. - snpEff dir:
snpEff
installation configured with sars-cov-2 references.
- Will contain samples analyses output organised in directories per sample, including assssociated
slurm.out
files.
- Artic
V3
primer.bed
,.fa
and.tsv
files.
- Sars-CoV-2 reference genome NC_045512.2
.fa
sequence and.gff
feature files.
- Will contain read file pairs (
.fatq
/.fastq.gz
).