Gonzalo Benegas's starred repositories

faiss

A library for efficient similarity search and clustering of dense vectors.

al-folio

A beautiful, simple, clean, and responsive Jekyll theme for academics

Language:HTMLLicense:MITStargazers:9547Issues:23Issues:516

miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Language:GoLicense:NOASSERTIONStargazers:8643Issues:68Issues:636

pandarallel

A simple and efficient tool to parallelize Pandas operations on all available CPUs

Language:PythonLicense:BSD-3-ClauseStargazers:3550Issues:26Issues:216

copilot.el

An unofficial Copilot plugin for Emacs.

Language:Emacs LispLicense:MITStargazers:1681Issues:38Issues:206

zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.

Language:PythonLicense:MITStargazers:1367Issues:44Issues:725

curated-transformers

🤖 A PyTorch library of curated Transformer models and their composable components

Language:PythonLicense:MITStargazers:849Issues:14Issues:31

MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

Language:PythonLicense:MITStargazers:593Issues:10Issues:13

cactus

Official home of genome aligner based upon notion of Cactus graphs

Language:CLicense:NOASSERTIONStargazers:480Issues:21Issues:789

pyranges

Performant Pythonic GenomicRanges

Language:PythonLicense:MITStargazers:420Issues:12Issues:215

basenji

Sequential regulatory activity predictions with deep convolutional neural networks.

Language:PythonLicense:Apache-2.0Stargazers:373Issues:30Issues:166

datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:321Issues:26Issues:168

scidataflow

Command line scientific data management tool

Language:RustLicense:MITStargazers:185Issues:1Issues:15

ProteinGym

Official repository for the ProteinGym benchmarks

Language:HTMLLicense:MITStargazers:177Issues:5Issues:27

gpn

Genomic Pre-trained Network

Language:Jupyter NotebookLicense:MITStargazers:165Issues:8Issues:24

WiggleTools

Basic operations on the space of numerical functions defined on the genome using lazy evaluators for flexibility and efficiency

Language:CLicense:Apache-2.0Stargazers:139Issues:19Issues:67

tangermeme

Biological sequence analysis for the modern age.

Language:PythonLicense:MITStargazers:134Issues:8Issues:4

granges

A Rust library and command line tool for working with genomic ranges and their data.

Language:RustStargazers:87Issues:0Issues:0

BEND

Benchmarking DNA Language Models on Biologically Meaningful Tasks

Language:PythonLicense:BSD-3-ClauseStargazers:77Issues:4Issues:16

borzoi

RNA-seq prediction with deep convolutional neural networks.

Language:PythonLicense:Apache-2.0Stargazers:66Issues:4Issues:11

taffy

This is a library C/Python/CLI for working with TAF (.taf,.taf.gz) and MAF (.maf) alignment files

Language:CLicense:MITStargazers:24Issues:0Issues:0

MAGE

Analysis of gene expression and splicing diversity in a subset of samples from the 1000 Genomes Project, including eQTL and sQTL discovery and annotation.

Language:RStargazers:19Issues:4Issues:0

poranges

pyranges/bioframe for polars

ldgm

Software for linkage disequilibrium graphical models

Language:MATLABLicense:MITStargazers:14Issues:0Issues:0
Language:Jupyter NotebookLicense:MITStargazers:14Issues:3Issues:8

Conservatory

Identification of conserved non-coding sequences in plants

Language:PerlLicense:GPL-3.0Stargazers:12Issues:3Issues:6

LLM_eval

Code repository for study ''Evaluating the representational power of pre-trained DNA language models for regulatory genomics"

Language:Jupyter NotebookLicense:MITStargazers:8Issues:0Issues:0

HOTFOXES

Scripts for Rocha et al. 2023: "North-African fox genomes show signatures of repeated introgression and adaptation to life in deserts"

Language:RLicense:MITStargazers:6Issues:1Issues:0