There are 7 repositories under genome-analysis topic.
[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
Intervene: a tool for intersection and visualization of multiple genomic region and gene sets
Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
H.E.L.E.N. (Homopolymer Encoded Long-read Error-corrector for Nanopore)
A tool for classifying prokaryote protein sequences into COG(Cluster of Orthologous Genes) functional category
RawHash can accurately and efficiently map raw nanopore signals to reference genomes of varying sizes (e.g., from viral to a human genomes) in real-time without basecalling. Described by Firtina et al. (published at https://academic.oup.com/bioinformatics/article/39/Supplement_1/i297/7210440).
Bioinformatics on GCP, AWS or Azure
BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applications: 1) finding overlapping reads and 2) read mapping. Described by Firtina et al. (published in NARGAB https://doi.org/10.1093/nargab/lqad004)
Snakemake workflow for the analysis of biosynthetic gene clusters across large collections of genomes (pangenomes)
non-redundant, compressed, journalled, file-based storage for biological sequences
A minimal genetic data explorer that processes all genetic information locally.
zgtools: A pipeline that allows for the convenient acquisition of T2T (Telomere-to-Telomere) genomes.
Transcription Factor Binding Prediction from ATAC-seq and scATAC-seq with Deep Neural Networks
Bacterial surveillance pipeline.
provides common tools and lookup tables used primarily by the hgvs and uta packages
A python project for analysis of codon usage for gene or genome analysis
Using combined evidence from replicates to evaluate ChIP-seq peaks
A random forest classifier to identify contigs of plasmid origin in contig and scaffold genomes
metaUSAT is a data-adaptive statistical approach for testing genetic associations of multiple traits from single/multiple studies using univariate GWAS summary statistics.
Scripts and procedures for detecting positively selected genes and codons in primates
NanoRepeat: fast and accurate analysis of Short Tandem Repeats (STRs) from Oxford Nanopore sequencing data
Reconstructs complex variation using Bionano optical mapping data and breakpoint graph data
Fast Bayesian Hidden Markov Model with Wavelet Compression
Detecting transposable element invasions without repeat library. Detects also horizontal transfer events and endogenized viruses. All you need is a reference genome and some short reads
Statistical and computational analysis of the human genome
Genome-on-Diet is a fast and memory-frugal framework for exemplifying sparsified genomics for read mapping, containment search, and metagenomic profiling. It is much faster & more memory-efficient than minimap2 for Illumina, HiFi, and ONT reads. Described by Alser et al. (preliminary version: https://arxiv.org/abs/2211.08157).