christindbose

Christin David Bose's repositories

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Language:PythonApache-2.0000

CodeGen

CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.

Language:PythonBSD-3-Clause000

dlrm_syn

An implementation of a deep learning recommendation model (DLRM)

Language:PythonMIT000

ECE60827_simulation_project_part1

Language:C++BSD-2-Clause010

ECE60827_simulation_project_part1_old

Part 1 of HW simulation project

Language:PythonNOASSERTION010

ECE60827_simulation_project_part2

Language:C++BSD-2-Clause010

ECE60827_simulation_project_part3

Language:C++BSD-2-Clause010

ECE60827_simulation_project_part4-bonus

Repo for the HW simulation project part 4 (bonus)

Language:C++BSD-2-Clause010

flash-llm

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Language:CudaApache-2.0000

ggml

Tensor library for machine learning

Language:CMIT000

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of Merlin-KV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.

Language:CudaApache-2.0000

llama

Inference code for LLaMA models

Language:PythonNOASSERTION000

LLM-Pruner

LLM-Pruner: On the Structural Pruning of Large Language Models

Language:PythonApache-2.0000

LLM4HWDesign_Starting_Toolkit

LLM4HWDesign Starting Toolkit

000

mgpu-gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated (and validated) energy model, GPUWattch.

Language:C++NOASSERTION000

ml-engineering

Machine Learning Engineering Guides and Tools

Language:PythonCC-BY-SA-4.0000

ml-fastvit

This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization"

Language:PythonNOASSERTION000

mlir-tutorial

Language:Starlark000

oss-arch-gym

Open source version of ArchGym project.

Language:Jupyter Notebook000

python-mastery

Advanced Python Mastery (course by @dabeaz)

Language:PythonCC-BY-SA-4.0000

pytorch-direct_dgl

PyTorch-Direct code on top of PyTorch-1.8.0nightly (e152ca5) for Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB)

000

reproduce_isca23_cpu_DLRM_inference

Sharing the codebase and steps for artifact evaluation for ISCA 2023 paper

Language:Python000

superblock

A block oriented training approach for inference time optimization.

Language:PythonMIT000

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonApache-2.0000

tpu_graphs

Language:C++Apache-2.0000

christindbose

Christin David Bose's repositories

60827_assignment1

60827_assignment2

60827_assignment3

60827_assignment4

AITemplate

CodeGen

dlrm_syn

ECE60827_simulation_project_part1

ECE60827_simulation_project_part1_old

ECE60827_simulation_project_part2

ECE60827_simulation_project_part3

ECE60827_simulation_project_part4-bonus

flash-llm

FLsystem-paper

ggml

HierarchicalKV

llama

LLM-Pruner

LLM4HWDesign_Starting_Toolkit

mgpu-gpgpu-sim_distribution

ml-engineering

ml-fastvit

mlir-tutorial

oss-arch-gym

python-mastery

pytorch-direct_dgl

reproduce_isca23_cpu_DLRM_inference

superblock

TinyLlama

tpu_graphs