kabicm

Marko Kabić's repositories

alpa

Auto parallelization for large-scale neural networks

Apache-2.0000

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

BSD-3-Clause000

attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Language:PythonMIT000

ColossalAI

Colossal-AI: A Unified Deep Learning System for Big Model Era

Apache-2.0000

COSTA

Distributed Communication-Optimal Shuffle and Transpose Algorithm

BSD-3-Clause000

cuCollections

Apache-2.0000

cudf

cuDF - GPU DataFrame Library

Apache-2.0000

cylon

Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.

Language:C++Apache-2.0000

DFI-public

Apache-2.0000

DT-FM

MIT000

FasterTransformer

Transformer related optimization, including BERT, GPT

Apache-2.0000

FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

NOASSERTION000

flash-attention

Fast and memory-efficient exact attention

Language:C++Apache-2.0000

flax

Flax is a neural network library for JAX that is designed for flexibility.

Apache-2.0000

gavel

Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020

Language:Jupyter NotebookMIT000

google-research

Google Research

Language:Jupyter NotebookApache-2.0000

marius

Large scale embeddings on a single machine.

Language:C++Apache-2.0000

mesh

Mesh TensorFlow: Model Parallelism Made Easier

Apache-2.0000

mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Apache-2.0000

minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

MIT000

parallelformers

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment

Apache-2.0000

pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Language:PythonNOASSERTION000

query-engine

LingoDB: A new analytical database system that blurs the lines between databases and compilers.

MIT000

semiprof

Simple thread safe annotation based C++ profiler.

Language:C++BSD-3-Clause000

snn_toolbox

Toolbox for converting analog to spiking neural networks (ANN to SNN), and running them in a spiking neuron simulator.

MIT000

spack

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Language:PythonNOASSERTION000

sql-parser

SQL Parser for C++. Building C++ object structure from SQL statements.

MIT000

transformer-from-scratch

Well documented, unit tested, type checked and formatted implementation of a vanilla transformer - for educational purposes.

000

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Apache-2.0000

trax

Trax — Deep Learning with Clear Code and Speed

Apache-2.0000