Aneureka

followers

following

stars

NVIDIA

Shanghai, China

https://www.aneureka.com

Organizations

Leftovers4

Hiki's starred repositories

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonMIT36312 368 315

llm.c

LLM training in simple, raw C/CUDA

Language:CudaMIT23383 227 132

just

🤖 Just a command runner

Language:RustCC0-1.020323 70 1014

llama2.c

Inference Llama 2 in one file of pure C

Language:CMIT17188 190 220

tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Language:PythonApache-2.015341 466 1247

transfer.sh

Easy and fast file sharing from the command-line.

Language:GoMIT15227 209 388

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause13476 114 1034

gridea

✍️ A static blog writing client (一个静态博客写作客户端)

Language:TypeScriptMIT9945 111 1042

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.08234 87 1801

nmt

TensorFlow Neural Machine Translation Tutorial

Language:PythonApache-2.06368 251 426

annotated-transformer

An annotated implementation of the Transformer paper.

Language:Jupyter NotebookMIT5584 65 88

speedscope

🔬 A fast, interactive web-based viewer for performance profiles.

Language:TypeScriptMIT5480 51 272

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++NOASSERTION5391 103 1068

lolcommits

:camera: git-based selfies for software developers

Language:RubyLGPL-3.04727 41 195

transformer

Transformer: PyTorch Implementation of "Attention Is All You Need"

Language:Python2804 7 20

libtree

ldd as a tree

Language:CMIT2629 21 51

Awesome-System-for-Machine-Learning

A curated list of research in machine learning systems (MLSys). Paper notes are also provided.

lectures

Material for cuda-mode lectures

Language:Jupyter NotebookApache-2.02440 34 7

CUDA-Learn-Notes

🎉 CUDA Learn Notes with PyTorch: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaGPL-3.01169 12 5

resource-stream

CUDA related news and material links

yiyin

一款照片水印添加工具

Language:TypeScriptGPL-3.0716 3 41

Loser-HomeWork

卢瑟们的作业展示，答案讲解，以及一些C++知识

Language:C++Apache-2.0623 8 35

HPC-Learning-Notes

高性能计算相关知识学习笔记，包含学习笔记和相关知识的代码demo，在持续完善中。如果有帮助的话请Star一下，对作者帮助很大，谢谢！

Language:Jupyter Notebook357 6 1

CUDA-Optimization-Guide

Xiao's CUDA Optimization Guide [Active Adding New Contents]

GPL-3.0223 10

cuda_learning

learning how CUDA works

Language:Cuda152 4 2

CUDA-PPT

Apache-2.078 20

cuda-sgemm

Language:Cuda48 4 2

CUDA-From-Correctness-To-Performance-Code

Codes & examples for "CUDA - From Correctness to Performance"

Language:C++Apache-2.045 20

getdataset-from-hobby.lkszj.info

对于hobby.lkszj.info网站的数据爬取

Language:Python1500

cs229s-nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonMIT700