Vikas Sharma's repositories
BV-BRC-Genome-Downloader
A command-line utility for efficiently downloading genome data from the Bacterial and Viral Bioinformatics Resource Center (BV-BRC), supporting multiple file types and providing failure logs for troubleshooting.
fasta-extractor
fasta-extractor.pl extracts ORFs from a genomic fasta file based on coordinates in an ID list, generating corresponding sequences from two input files: genomic_fasta-file and id_list-file.
NCBIClosestStrainFetcher
A Python utility to extract closest reference strain data from the NCBI database using assembly identifiers from a TSV input file (gtdbtk.bac120.summary.tsv). The script outputs detailed strain information in a structured TSV format.
Pathway-Feature-Identification
"Pathway_Feature_Identification.py" is a script for analyzing microbial genomic data, identifying antimicrobial resistance-associated pathways using KEGG data, and applying logistic regression for feature selection.
AMRFinderPlus-Matrix
AMRFinderPlus-Matrix contains script for processing the output files of AMRFinderPlus and generating a binary matrix that shows the presence or absence of antibiotic resistance genes, stress response genes, and virulence genes in each sample.
Average-KL-Divergence-Calculator
average-KL-divergence-calculator.py is a Python script that calculates the average KL divergence for each FASTA file in a directory and produces separate output files and a combined output file with the results.
bed-annotator
This script annotates a BED file with gene information using the Ensembl REST API. It is specifically designed to work with human genome build GRCh37 (hg19).
codon-alignment-suite
It is a powerful tool for aligning nucleotide sequences based on protein alignments, translating nucleotide sequences to protein, and generating phylogenetic trees using Biopython.
eggnogCOGextractor
eggnogCOGextractor.py is a Python script designed to extract COG (Clusters of Orthologous Groups) information from EggNOG data. This script processes EggNOG annotations to identify and extract relevant COG data, providing insights into functional categories of genes.
FASTAValidator
FASTAValidator: A Python script for validating FASTA files by checking their format and sequence content
KEGG_Modules_Fetcher
A Python script for efficiently retrieving and organizing module-related data from the KEGG database, including entries, symbols, pathway IDs, and names.
PanGenomeAnalysisTool
PanGenomeAnalysisTool: A Python script for pan-genome analysis, generating plots, and statistical insights. Analyze gene presence and absence in multiple genomes effortlessly.
score-analysis-visualizations
score-analysis-visualizations: A script for analyzing and visualizing scores in a TSV file, generating bar plots, box plots, and summary statistics.
stouffers-method-statistical-analysis
The "stouffers_method.R" code performs statistical analysis using Stouffer's method to combine p-values for a group of entities from tab-separated input data, and outputs the results to a new tab-separated file including entity names, combined p-values, and ranks.
top_1000_indexes_from_fastq
This Python script can be used to extract, count, and output the top 1000 paired indexes from undetermined sequences in paired-end FASTQ files.