genome-analysis

There are 7 repositories under genome-analysis topic.

MAGICS-LAB / DNABERT_2
[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
dataset dna dna-processing dna-training genome genome-analysis language-model covid promoter promoter-analysis promoters splice splice-site transcription-factor-binding transcription-factor-binding-site transcription-factors
Language:Shell 417
marbl / Winnowmap
Long read / genome alignment software
genome-analysis nanopore pacbio
Language:C 282
biocommons / hgvs
Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
bioinformatics genome-analysis genomics sequencing variant-analysis variation
Language:Python 265
mbhall88 / rasusa
Randomly subsample sequencing reads or alignments
alignment bam bioinformatics coverage downsample fasta fastq genome-analysis random rust subsampling
Language:Rust 238
EarlGrey
TobyBaril / EarlGrey
Earl Grey: A fully automated TE curation and annotation pipeline
bioinformatics genomics transposable-elements genome-annotation genome-analysis te-annotations
Language:Python 165
intervene
asntech / intervene
Intervene: a tool for intersection and visualization of multiple genomic region and gene sets
visualization genome-analysis venn-diagram heatmaps
Language:Python 138
mcveanlab / mccortex
De novo genome assembly and multisample variant calling
genome-assembly de-bruijn-graphs kmer cortex variant-calling genome-graph contigs genome-analysis genomics
Language:C 112
fmalmeida / bacannot
Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
annotation antimicrobial-genes-annotation barrnap genome-analysis genome-browser genomic-islands-prediction insertion-sequences ko-annotation methylation-annotation mobile-genetic-elements nextflow phigaro pipeline prophage-sequences-prediction prophages reproducible-research rgi rrna-prediction virulence-factor virulence-genes
Language:Nextflow 100
ganlab / GALA
Long-reads Gap-free Chromosome-scale Assembler
gap-filling genome-analysis genome-assembly long-reads nanopore pacbio scaffolding
Language:Python 79
kishwarshafin / helen
H.E.L.E.N. (Homopolymer Encoded Long-read Error-corrector for Nanopore)
genome-assembly genomic-data-analysis human-genetics genome-scale-models oxford-nanopore genome-analysis genome-sequencing genomes genomics
Language:Python 70
COGclassifier
moshi4 / COGclassifier
A tool for classifying prokaryote protein sequences into COG(Cluster of Orthologous Genes) functional category
python bioinformatics cog genomics functional-annotation functional-analysis comparative-genomics visualization protein microbial-genomics genome-analysis
Language:Python 62
RawHash
CMU-SAFARI / RawHash
RawHash can accurately and efficiently map raw nanopore signals to reference genomes of varying sizes (e.g., from viral to a human genomes) in real-time without basecalling. Described by Firtina et al. (published at https://academic.oup.com/bioinformatics/article/39/Supplement_1/i297/7210440).
bioinformatics contamination event-detection genome-analysis hash-tables nanopore nanopore-analysis-pipeline nanopore-data nanopore-minion nanopore-reads nanopore-sequencing raw-nanopore-signal-analysis raw-signal rawhash read-mapping relative-abundances seeding segmentation
Language:C 57
lynnlangit / TeamTeri
Bioinformatics on GCP, AWS or Azure
cancer-genomics variants genome-analysis genome-sequencing gcp aws variant-calls bioinformatics genomics-analysis gatk azure
Language:Shell 55
CMU-SAFARI / BLEND
BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applications: 1) finding overlapping reads and 2) read mapping. Described by Firtina et al. (published in NARGAB https://doi.org/10.1093/nargab/lqad004)
bioinformatics blend de-novo-assembly genome-analysis genome-assembly minimizers read-mapping strobemers fuzzy-seeds read-overlapping seed-matching spaced-seeds
Language:C 43
NBChub / bgcflow
Snakemake workflow for the analysis of biosynthetic gene clusters across large collections of genomes (pangenomes)
pangenome-pipeline biosynthetic-gene-clusters genome-annotation genome-analysis
Language:Python 41
biocommons / biocommons.seqrepo
non-redundant, compressed, journalled, file-based storage for biological sequences
bioinformatics genome-analysis genomics sequencing variant-analysis variation
Language:Python 40
pievos101 / PopGenome
An Efficient Swiss Army Knife for Population Genomic Analyses in R
population-genomics snps genome-analysis
Language:R 34
cccnrc / plot-VCF
visual analysis of your VCF files
genetics genome genome-analysis genome-graph graph graphics graphics-programming plot plot-generator variant-analysis variant-annotations variants vcf vcf-files visual-analysis visualization
Language:R 32
ilarsf / gwasTools
Basic and fast GWAS functions for QQ and Manhattan plots (incl. gene names)
gwas rscript qq manhattan-plot genome-analysis snps power plotting optparse
Language:R 30
codex
brandonsaldan / codex
A minimal genetic data explorer that processes all genetic information locally.
bioinformatics dna genetics genome-analysis snpedia
Language:JavaScript 29
linyuiz / zgtools
zgtools: A pipeline that allows for the convenient acquisition of T2T (Telomere-to-Telomere) genomes.
genome-analysis pipeline t2t zgtools
Language:HTML 29
maxATAC
MiraldiLab / maxATAC
Transcription Factor Binding Prediction from ATAC-seq and scATAC-seq with Deep Neural Networks
deeplearning atac-seq chip-seq transcription-factor-binding maxatac genome-analysis
Language:Python 28
Zhuxitong / ppsPCP
A Plant Presence/absence Variants Scanner and Pan-genome Construction Pipeline
plant pan-genome genome-analysis pan-genome-construction pavs
Language:Perl 26
DOH-JDJ0303 / bigbacter-nf
Bacterial surveillance pipeline.
accessory-genome bacterial-genomics clustering-analysis genome-analysis public-health public-health-surveillance snp-analysis
Language:Nextflow 24
biocommons / bioutils
provides common tools and lookup tables used primarily by the hgvs and uta packages
bioinformatics genome-analysis genomics sequencing variant-analysis variation
Language:Python 22
CodonU
SouradiptoC / CodonU
A python project for analysis of codon usage for gene or genome analysis
bioinformatics bioinformatics-analysis bioinformatics-tool codon codon-bias codon-usage genome genome-analysis codonw cai cbi enc rscu tai
Language:Python 21
Genometric / MSPC
Using combined evidence from replicates to evaluate ChIP-seq peaks
next-generation-sequencing chip-seq ngs-analysis genome-analysis peak enriched-regions overlapping-peaks analysis mspc peaks
Language:C# 20
leaemiliepradier / PlasForest
A random forest classifier to identify contigs of plasmid origin in contig and scaffold genomes
plasmid random-forest-classifier genome-analysis homology-search pipeline
Language:Python 19
RayDebashree / metaUSAT
metaUSAT is a data-adaptive statistical approach for testing genetic associations of multiple traits from single/multiple studies using univariate GWAS summary statistics.
summary-statistics multiple-traits meta-analysis gwas multivariate-analysis overlapping-samples genome-analysis multiple-studies rscript pleiotropy cross-phenotype genetic-epidemiology statistical-genetics score-test metausat phewas
Language:R 19
robinvanderlee / positive-selection
Scripts and procedures for detecting positively selected genes and codons in primates
bioinformatics bioinformatics-pipeline bioinformatics-scripts bioinformatics-analysis computational-biology evolution comparative-genomics positive-selection primates genetics sequence-alignment genomics genomes genome-analysis codeml immunity ensembl
Language:Perl 19
WGLab / NanoRepeat
NanoRepeat: fast and accurate analysis of Short Tandem Repeats (STRs) from Oxford Nanopore sequencing data
bioinformatics genome-analysis nanopore-sequencing repeat-detection short-tandem-repeats genomics pacbio-sequencing sequencing
Language:Python 18
AmpliconSuite / AmpliconReconstructorOM
Reconstructs complex variation using Bionano optical mapping data and breakpoint graph data
ngs python optical-mapping bionano ecdna cancer-genomics genome-sequencing genome-assembly genome-analysis
Language:Python 17
wiedenhoeft / HaMMLET
Fast Bayesian Hidden Markov Model with Wavelet Compression
hmm wavelet-compression wavelet wavelets wavelet-transform genomics genome-analysis genome hidden-markov-model hidden-markov-models segmentation time-series time-series-analysis statistics statistical-inference bayesian-inference bayesian-statistics bayesian-data-analysis machine-learning bioinformatics
Language:C++ 17
rpianezza / GenomeDelta
Detecting transposable element invasions without repeat library. Detects also horizontal transfer events and endogenized viruses. All you need is a reference genome and some short reads
bedtools bwa-mem consensus-sequences fasta fastq genome genome-analysis genome-assembly multiple-sequence-alignment samtools transposable-element transposons
Language:Shell 16
CLASS_2021
boulderrinnlab / CLASS_2021
Statistical and computational analysis of the human genome
chip-seq rna-seq r bash encode lncrna epigentics genomics genome-analysis
Language:R 15
CMU-SAFARI / Genome-on-Diet
Genome-on-Diet is a fast and memory-frugal framework for exemplifying sparsified genomics for read mapping, containment search, and metagenomic profiling. It is much faster & more memory-efficient than minimap2 for Illumina, HiFi, and ONT reads. Described by Alser et al. (preliminary version: https://arxiv.org/abs/2211.08157).
bioinformatics containment-search genome-analysis genome-on-diet genomics large-scale metagenomic-analysis metagenomics microbiome-analysis minimap2 read-mapping sequence-alignment variant-calling wavefront-alignment
Language:Roff 14