vsmicrogenomics

Vikas Sharma's repositories

BV-BRC-Genome-Downloader

A command-line utility for efficiently downloading genome data from the Bacterial and Viral Bioinformatics Resource Center (BV-BRC), supporting multiple file types and providing failure logs for troubleshooting.

Language:Shell1 10

fasta-extractor

fasta-extractor.pl extracts ORFs from a genomic fasta file based on coordinates in an ID list, generating corresponding sequences from two input files: genomic_fasta-file and id_list-file.

Language:Perl100

NCBIClosestStrainFetcher

A Python utility to extract closest reference strain data from the NCBI database using assembly identifiers from a TSV input file (gtdbtk.bac120.summary.tsv). The script outputs detailed strain information in a structured TSV format.

Language:Python100

Pathway-Feature-Identification

"Pathway_Feature_Identification.py" is a script for analyzing microbial genomic data, identifying antimicrobial resistance-associated pathways using KEGG data, and applying logistic regression for feature selection.

Language:Python1 20

AMRFinderPlus-Matrix

AMRFinderPlus-Matrix contains script for processing the output files of AMRFinderPlus and generating a binary matrix that shows the presence or absence of antibiotic resistance genes, stress response genes, and virulence genes in each sample.

Language:Python010

Average-KL-Divergence-Calculator

average-KL-divergence-calculator.py is a Python script that calculates the average KL divergence for each FASTA file in a directory and produces separate output files and a combined output file with the results.

Language:Python010

bed-annotator

This script annotates a BED file with gene information using the Ensembl REST API. It is specifically designed to work with human genome build GRCh37 (hg19).

Language:Python010

codon-alignment-suite

It is a powerful tool for aligning nucleotide sequences based on protein alignments, translating nucleotide sequences to protein, and generating phylogenetic trees using Biopython.

Language:PythonMIT010

eggnogCOGextractor

eggnogCOGextractor.py is a Python script designed to extract COG (Clusters of Orthologous Groups) information from EggNOG data. This script processes EggNOG annotations to identify and extract relevant COG data, providing insights into functional categories of genes.

Language:Python010

FASTAValidator

FASTAValidator: A Python script for validating FASTA files by checking their format and sequence content

Language:Python010

KEGG_Modules_Fetcher

A Python script for efficiently retrieving and organizing module-related data from the KEGG database, including entries, symbols, pathway IDs, and names.

Language:Python010

PanGenomeAnalysisTool

PanGenomeAnalysisTool: A Python script for pan-genome analysis, generating plots, and statistical insights. Analyze gene presence and absence in multiple genomes effortlessly.

Language:Python010

score-analysis-visualizations

score-analysis-visualizations: A script for analyzing and visualizing scores in a TSV file, generating bar plots, box plots, and summary statistics.

Language:R010

stouffers-method-statistical-analysis

The "stouffers_method.R" code performs statistical analysis using Stouffer's method to combine p-values for a group of entities from tab-separated input data, and outputs the results to a new tab-separated file including entity names, combined p-values, and ranks.

Language:R010

top_1000_indexes_from_fastq

This Python script can be used to extract, count, and output the top 1000 paired indexes from undetermined sequences in paired-end FASTQ files.

Language:Python010