rdspring1

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Language:PythonApache-2.0000

atari-representation-learning

Code for "Unsupervised State Representation Learning in Atari"

Language:PythonMIT020

Auto-GPT

An experimental open-source attempt to make GPT-4 fully autonomous.

Language:PythonMIT000

Autodiff-Puzzles

Language:Jupyter NotebookMIT000

Autopilot-TensorFlow

A TensorFlow implementation of this Nvidia paper: https://arxiv.org/pdf/1604.07316.pdf with some changes

Language:Jupyter NotebookMIT030

cs231n

Solutions to Stanford CS231n Spring 2018 Course Assignments.

Language:Jupyter Notebook010

cuda-training-series

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Language:Cuda000

dlrm_ssm

Language:PythonMIT010

micrograd

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

Language:Jupyter NotebookMIT010

minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Language:PythonMIT010

mongoose

A Learnable LSH Framework for Efficient NN Training

Language:PythonMIT010

NvFuser

A Fusion Code Generator for NVIDIA GPUs

Language:C++NOASSERTION000

nvprims-torchdynamo

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

Language:PythonBSD-3-Clause010

Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.

Language:CGPL-3.0010

Optimizing-DGEMV-on-Intel-CPUs

Highly optimized DGEMV on CPU with both serial and parallel performance better than MKL and OpenBLAS.

Language:CGPL-3.0010

Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Language:PythonGPL-3.0010

tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Language:PythonMIT010

twitter-algorithm-ml

Source code for Twitter's Recommendation Algorithm

Language:PythonAGPL-3.0000

vector-search-class-notes

Class notes for the course "Long Term Memory in AI - Vector Search and Databases" COS 495 @ Princeton Fall 2023

Language:TeXMIT000

xla

Enabling PyTorch on Google TPU

Language:C++NOASSERTION010