There are 7 repositories under genome-analysis topic.
Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
Intervene: a tool for intersection and visualization of multiple genomic region and gene sets
Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
H.E.L.E.N. (Homopolymer Encoded Long-read Error-corrector for Nanopore)
Bioinformatics on GCP, AWS or Azure
A tool for classifying prokaryote protein sequences into COG(Cluster of Orthologous Genes) functional category
RawHash is the first mechanism that can accurately and efficiently map raw nanopore signals to large reference genomes (e.g., a human reference genome) in real-time without using powerful computational resources (e.g., GPUs). Described by Firtina et al. (published at https://academic.oup.com/bioinformatics/article/39/Supplement_1/i297/7210440)
non-redundant, compressed, journalled, file-based storage for biological sequences
BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applications: 1) finding overlapping reads and 2) read mapping. Described by Firtina et al. (published in NARGAB https://doi.org/10.1093/nargab/lqad004)
A minimal genetic data explorer that processes all genetic information locally.
Snakemake workflow for the analysis of biosynthetic gene clusters across large collections of genomes (pangenomes)
Transcription Factor Binding Prediction from ATAC-seq and scATAC-seq with Deep Neural Networks
Snakemake workflow for scoring and comparing multiple bacterial genome assemblies (Illumina, Nanopore) to reference genome(s)
provides common tools and lookup tables used primarily by the hgvs and uta packages
Using combined evidence from replicates to evaluate ChIP-seq peaks
Scripts and procedures for detecting positively selected genes and codons in primates
A random forest classifier to identify contigs of plasmid origin in contig and scaffold genomes
A python project for analysis of codon usage for gene or genome analysis
Fast Bayesian Hidden Markov Model with Wavelet Compression
metaUSAT is a data-adaptive statistical approach for testing genetic associations of multiple traits from single/multiple studies using univariate GWAS summary statistics.
Reconstructs complex variation using Bionano optical mapping data and breakpoint graph data
NanoRepeat: fast and accurate analysis of Short Tandem Repeats (STRs) from Oxford Nanopore sequencing data
Statistical and computational analysis of the human genome
[in development] Proof-of-Concept variation translation, validation, and registration service
Genome-on-Diet is a fast and memory-frugal framework for exemplifying sparsified genomics for read mapping, containment search, and metagenomic profiling. It is much faster & more memory-efficient than minimap2 for Illumina, HiFi, and ONT reads. Described by Alser et al. (preliminary version:Â https://arxiv.org/abs/2211.08157).
A collaborative notebook for genes and genomes