crapromer

crapromer

Geek Repo

Github PK Tool:Github PK Tool

crapromer's starred repositories

rssbox-android

It is a rss reader for android. Based on Rust and Slint-ui.

Language:SlintLicense:MITStargazers:15Issues:0Issues:0

cudnn-memo

Example code of cuDNN.

Language:CudaStargazers:5Issues:0Issues:0

CUDA

CUDA and cuDNN examples

Language:CudaStargazers:2Issues:0Issues:0

gpu-tensor-core

A set of programs testing CUDA Tensor Core performance

Language:CudaStargazers:6Issues:0Issues:0

TiledCUDA

TiledCUDA is an efficient kernel template library written in CuTe, which provides a wrapper for cutlass CuTe and enables more efficient fusion.

Language:C++License:MITStargazers:28Issues:0Issues:0

CLImage

A C++ GPGPU OpenCL library for Android and Unix systems.

License:Apache-2.0Stargazers:1Issues:0Issues:0

operators

算子库

Language:C++Stargazers:4Issues:0Issues:0

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookLicense:MITStargazers:11092Issues:0Issues:0
Language:C++Stargazers:1Issues:0Issues:0
Language:CudaStargazers:1Issues:0Issues:0

CUDA-Learn-Notes

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaLicense:GPL-3.0Stargazers:851Issues:0Issues:0

mlops-coding-course

Learn how to create, develop, and maintain a state-of-the-art MLOps code base

Language:PythonLicense:CC-BY-4.0Stargazers:177Issues:0Issues:0

cucumat

private implementation of cudamat

Language:PythonStargazers:2Issues:0Issues:0

MapEditor

a simple ui pack based on python and pyqt5 target to simulate the appearance of imgui

Language:PythonStargazers:16Issues:0Issues:0

cuda-repo

From zero to hero CUDA for accelerating maths and machine learning on GPU.

Language:CudaLicense:MITStargazers:154Issues:0Issues:0

CompPhys

CompPhys - a Computational Physics repository

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:84Issues:0Issues:0

PyTorch-GAN

PyTorch implementations of Generative Adversarial Networks.

Language:PythonLicense:MITStargazers:16001Issues:0Issues:0

How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

Language:CudaLicense:Apache-2.0Stargazers:756Issues:0Issues:0

mlmodelscope

MLModelScope is an open source, extensible, and customizable platform to facilitate evaluation and measurement of ML models within AI pipelines.

Language:JavaScriptLicense:NCSAStargazers:48Issues:0Issues:0

onnx2X

ONNX2Pytorch

Language:PythonStargazers:153Issues:0Issues:0

CudaGeMM

Benchmarking different cuda GeMM kernels

Language:CudaStargazers:2Issues:0Issues:0

cuda_gemm_benchmark

Base on gtest/benchmark, refer to https://github.com/Liu-xiandong/How_to_optimize_in_GPU

Language:CudaStargazers:5Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:21505Issues:0Issues:0

CUDA_Bench

CUDA GPU Benchmark

Language:CudaLicense:MITStargazers:11Issues:0Issues:0
Language:CudaStargazers:1999Issues:0Issues:0

ABigSurveyOfLLMs

A collection of 150+ surveys on LLMs

License:CC0-1.0Stargazers:151Issues:0Issues:0

Fast-TransX

An Efficient implementation of TransE and its extended models for Knowledge Representation Learning

Language:C++License:MITStargazers:398Issues:0Issues:0

OpenKE

An Open-Source Package for Knowledge Embedding (KE)

Language:PythonStargazers:3760Issues:0Issues:0

tqdm.cpp

C++ port of tqdm

Language:C++License:NOASSERTIONStargazers:295Issues:0Issues:0

Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

Language:PythonLicense:Apache-2.0Stargazers:6999Issues:0Issues:0