Daniel Baker (dnbaker)

dnbaker

Geek Repo

Company:@langmead-lab

Location:Baltimore, MD

Twitter:@dnb_hopkins

Github PK Tool:Github PK Tool

Daniel Baker's repositories

dashing

Fast and accurate genomic distances using HyperLogLog

Language:C++License:GPL-3.0Stargazers:159Issues:12Issues:64

sketch

C++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings

Language:C++License:MITStargazers:149Issues:8Issues:12

bonsai

Bonsai: Fast, flexible taxonomic analysis and classification

Language:C++License:MITStargazers:70Issues:9Issues:4

dashing2

Dashing 2 is a fast toolkit for k-mer and minimizer encoding, sketching, comparison, and indexing.

Language:C++License:MITStargazers:59Issues:5Issues:25

minicore

Fast and memory-efficient clustering + coreset construction, including fast distance kernels for Bregman and f-divergences.

Language:C++License:MITStargazers:32Issues:3Issues:1

bioseq

Tokenizers and Machine Learning Models for biological sequence data

vec

Type-generic SIMD library for optimized generic code generation

Language:C++License:MITStargazers:12Issues:2Issues:0

aesctr

C++ implementation of AES-CTR PRNG using SIMD, based on Samuel Neves' Implementation

Language:C++License:Apache-2.0Stargazers:11Issues:2Issues:0

wmh

Weighted Minhash Code

Language:C++Stargazers:5Issues:2Issues:0

fastiota

Fast std::iota for contiguous memory using SIMD operations

Language:CStargazers:4Issues:2Issues:0

libsimdsampling

Data- and processor- parallelism for fast weighted sampling

Language:C++License:MITStargazers:4Issues:2Issues:0
Language:C++License:GPL-3.0Stargazers:3Issues:2Issues:0

libkl

Kernels for fast vectorized KL divergence + related

Language:CLicense:MITStargazers:3Issues:2Issues:0

libtorch-kseq-demo

Demo using libtorch and one-hot encoding for fastx files

Language:C++Stargazers:2Issues:2Issues:0

dashing2-binaries

Binaries for releases for Dashing2

minicore-experiments

Experiments for minicore: fast scRNA-seq clustering with various distances

Language:PythonStargazers:1Issues:2Issues:0

scavenger

Rust spatial/single-cell genomics

Language:RustStargazers:1Issues:0Issues:0

tilt

Biased dataloaders for PyTorch and related utilities

Language:PythonStargazers:1Issues:2Issues:0

bioconda-recipes

Conda recipes for the bioconda channel.

Language:ShellStargazers:0Issues:1Issues:0
Stargazers:0Issues:2Issues:0

distmat

2-dimensional distance matrix for holding distances of arbitrary types.

Language:C++Stargazers:0Issues:2Issues:0
Stargazers:0Issues:0Issues:0

einops

Simplistic API for deep learning tensor operations

Language:RustStargazers:0Issues:1Issues:0

FFHT

Fast Fast Hadamard Transform

Language:CLicense:NOASSERTIONStargazers:0Issues:1Issues:0

megadepth

BigWig and BAM utilities

Language:C++License:NOASSERTIONStargazers:0Issues:1Issues:0

minilsh

Python bindings for Locality-Sensitive Hashers, built on the minicore C++ library.

Language:C++Stargazers:0Issues:2Issues:0

pathml

Tools for computational pathology

Language:PythonLicense:GPL-2.0Stargazers:0Issues:1Issues:0

ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.

Language:Jupyter NotebookLicense:AFL-3.0Stargazers:0Issues:1Issues:0

rust-tokenizers

Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

Language:RustLicense:Apache-2.0Stargazers:0Issues:1Issues:0