Hiki (Aneureka)

Aneureka

Geek Repo

Company:NVIDIA

Location:Shanghai, China

Home Page:https://www.aneureka.com

Twitter:@aneureka

Github PK Tool:Github PK Tool


Organizations
Leftovers4

Hiki's starred repositories

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonLicense:MITStargazers:36312Issues:368Issues:315

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:23383Issues:227Issues:132

just

🤖 Just a command runner

Language:RustLicense:CC0-1.0Stargazers:20323Issues:70Issues:1014

llama2.c

Inference Llama 2 in one file of pure C

tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Language:PythonLicense:Apache-2.0Stargazers:15341Issues:466Issues:1247

transfer.sh

Easy and fast file sharing from the command-line.

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:13476Issues:114Issues:1034

gridea

✍️ A static blog writing client (一个静态博客写作客户端)

Language:TypeScriptLicense:MITStargazers:9945Issues:111Issues:1042

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:8234Issues:87Issues:1801

nmt

TensorFlow Neural Machine Translation Tutorial

Language:PythonLicense:Apache-2.0Stargazers:6368Issues:251Issues:426

annotated-transformer

An annotated implementation of the Transformer paper.

Language:Jupyter NotebookLicense:MITStargazers:5584Issues:65Issues:88

speedscope

🔬 A fast, interactive web-based viewer for performance profiles.

Language:TypeScriptLicense:MITStargazers:5480Issues:51Issues:272

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:5391Issues:103Issues:1068

lolcommits

:camera: git-based selfies for software developers

Language:RubyLicense:LGPL-3.0Stargazers:4727Issues:41Issues:195

transformer

Transformer: PyTorch Implementation of "Attention Is All You Need"

libtree

ldd as a tree

Awesome-System-for-Machine-Learning

A curated list of research in machine learning systems (MLSys). Paper notes are also provided.

lectures

Material for cuda-mode lectures

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2440Issues:34Issues:7

CUDA-Learn-Notes

🎉 CUDA Learn Notes with PyTorch: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaLicense:GPL-3.0Stargazers:1169Issues:12Issues:5

resource-stream

CUDA related news and material links

yiyin

一款照片水印添加工具

Language:TypeScriptLicense:GPL-3.0Stargazers:716Issues:3Issues:41

Loser-HomeWork

卢瑟们的作业展示,答案讲解,以及一些C++知识

Language:C++License:Apache-2.0Stargazers:623Issues:8Issues:35

HPC-Learning-Notes

高性能计算相关知识学习笔记,包含学习笔记和相关知识的代码demo,在持续完善中。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!

Language:Jupyter NotebookStargazers:357Issues:6Issues:1

CUDA-Optimization-Guide

Xiao's CUDA Optimization Guide [Active Adding New Contents]

License:GPL-3.0Stargazers:223Issues:1Issues:0

cuda_learning

learning how CUDA works

License:Apache-2.0Stargazers:78Issues:2Issues:0

CUDA-From-Correctness-To-Performance-Code

Codes & examples for "CUDA - From Correctness to Performance"

Language:C++License:Apache-2.0Stargazers:45Issues:2Issues:0

getdataset-from-hobby.lkszj.info

对于hobby.lkszj.info网站的数据爬取

Language:PythonStargazers:15Issues:0Issues:0

cs229s-nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonLicense:MITStargazers:7Issues:0Issues:0