ZZK (MARD1NO)

MARD1NO

Geek Repo

Company:SiliconFlow

Location:Neverland

Home Page:https://mard1no.github.io/

Github PK Tool:Github PK Tool

ZZK's repositories

Awesome-GPU

Awesome resources for GPUs

License:BSD-3-ClauseStargazers:2Issues:0Issues:0
Language:C++Stargazers:1Issues:1Issues:0

mmyolo

OpenMMLab YOLO series toolbox and benchmark

Language:PythonLicense:GPL-3.0Stargazers:1Issues:0Issues:0

cmake-examples

Useful CMake Examples

Language:CMakeLicense:MITStargazers:0Issues:0Issues:0

oneflow

OneFlow is a performance-centered and open-source deep learning framework.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

Awesome-System-for-Machine-Learning

A curated list of research in machine learning systems (MLSys). Paper notes are also provided.

License:MITStargazers:0Issues:0Issues:0

CacheLib

Pluggable in-process caching engine to build and scale high performance services

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

Cpp-Concurrency-in-Action-2ed

C++11/14/17/20 multithreading, involving operating system principles and concurrent programming technology.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

CuAssembler

An unofficial cuda assembler, for all generations of SASS, hopefully :)

License:MITStargazers:0Issues:0Issues:0

CV-CUDA

CV-CUDA™ is an open-source, graphics processing unit (GPU)-accelerated library for cloud-scale image processing and computer vision.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

DI-engine

OpenDILab Decision AI Engine

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

fast_io

Significantly faster input/output for C++20

License:MITStargazers:0Issues:0Issues:0

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0

FBTT-Embedding

This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.

Language:CudaLicense:MITStargazers:0Issues:0Issues:0

free-programming-books

:books: Freely available programming books

License:NOASSERTIONStargazers:0Issues:0Issues:0

GPT2

An implementation of training for GPT2, supports TPUs

License:MITStargazers:0Issues:0Issues:0

matxscript

The model pre-processing and post-processing framework

License:Apache-2.0Stargazers:0Issues:0Issues:0

MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

Language:C++Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

Language:CudaStargazers:0Issues:0Issues:0

QSync

Official resporitory for "QSync: Adpative Mixed-Precision for Training Synchronization".

License:MITStargazers:0Issues:0Issues:0

taichi-hackathon-akinasan

Akinasan team(秋名山车队)'s code base for the 0th Taichi Hackathon.

License:MITStargazers:0Issues:0Issues:0

TensorRT

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

YHs_Sample

Yinghan's Code Sample

Language:CudaLicense:GPL-3.0Stargazers:0Issues:0Issues:0