Jiahao Tan (KarhouTam)

KarhouTam

Geek Repo

Company:Shenzhen University

Location:Solar System

Github PK Tool:Github PK Tool

Jiahao Tan's starred repositories

ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Language:PythonLicense:Apache-2.0Stargazers:32834Issues:476Issues:18304

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:25822Issues:212Issues:230

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:25671Issues:221Issues:4202

spdlog

Fast C++ logging library.

Language:C++License:NOASSERTIONStargazers:23685Issues:445Issues:2136

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:23012Issues:226Issues:131

jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)

Language:TypeScriptLicense:AGPL-3.0Stargazers:21727Issues:124Issues:1682

LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

pykan

Kolmogorov Arnold Networks

Language:Jupyter NotebookLicense:MITStargazers:14338Issues:108Issues:347

triton

Development repository for the Triton language and compiler

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:9807Issues:160Issues:679

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonLicense:BSD-3-ClauseStargazers:8274Issues:101Issues:1173

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Language:CLicense:NOASSERTIONStargazers:5996Issues:118Issues:231

oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

Language:C++License:Apache-2.0Stargazers:5846Issues:145Issues:966

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

c-style

My favorite C programming practices.

CUDA-Programming

Sample codes for my CUDA programming book

Language:CudaLicense:GPL-3.0Stargazers:1506Issues:29Issues:29

torchtitan

A native PyTorch Library for large model training

Language:PythonLicense:BSD-3-ClauseStargazers:1498Issues:35Issues:126

CUDALibrarySamples

CUDA Library Samples

Language:CudaLicense:NOASSERTIONStargazers:1493Issues:30Issues:187

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Language:C++License:NOASSERTIONStargazers:1155Issues:54Issues:159

flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Language:PythonLicense:MITStargazers:1108Issues:22Issues:36

CUDA-Learn-Notes

🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaLicense:GPL-3.0Stargazers:1084Issues:11Issues:5

CUDA-Optimization-Guide

Xiao's CUDA Optimization Guide [Active Adding New Contents]

License:GPL-3.0Stargazers:222Issues:1Issues:0

FedProto

[AAAI'22] FedProto: Federated Prototype Learning across Heterogeneous Clients

FedFed

[NeurIPS 2023] "FedFed: Feature Distillation against Data Heterogeneity in Federated Learning"

Language:PythonLicense:MITStargazers:91Issues:1Issues:4

FedPAC

Simplified Implementation of FedPAC

FedCIL

Code for ICLR 2023 Paper Better Generative Replay for Continual Federated Learning

HeteroPFL

[ICLR'24] Heterogeneous Personalized Federated Learning by Local-Global Updates Mixing via Convergence Rate

Language:PythonStargazers:9Issues:1Issues:0