Christin David Bose's repositories
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
CodeGen
CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
dlrm_syn
An implementation of a deep learning recommendation model (DLRM)
ECE60827_simulation_project_part1_old
Part 1 of HW simulation project
ECE60827_simulation_project_part4-bonus
Repo for the HW simulation project part 4 (bonus)
flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
FLsystem-paper
Federated Learning Systems
ggml
Tensor library for machine learning
HierarchicalKV
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of Merlin-KV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.
llama
Inference code for LLaMA models
LLM-Pruner
LLM-Pruner: On the Structural Pruning of Large Language Models
LLM4HWDesign_Starting_Toolkit
LLM4HWDesign Starting Toolkit
mgpu-gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated (and validated) energy model, GPUWattch.
ml-engineering
Machine Learning Engineering Guides and Tools
ml-fastvit
This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization"
oss-arch-gym
Open source version of ArchGym project.
python-mastery
Advanced Python Mastery (course by @dabeaz)
pytorch-direct_dgl
PyTorch-Direct code on top of PyTorch-1.8.0nightly (e152ca5) for Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB)
reproduce_isca23_cpu_DLRM_inference
Sharing the codebase and steps for artifact evaluation for ISCA 2023 paper
superblock
A block oriented training approach for inference time optimization.
TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.