ZZK (MARD1NO)

MARD1NO

Geek Repo

Company:OneFlow

Location:Everywhere

Home Page:https://www.zhihu.com/people/mardino

Github PK Tool:Github PK Tool

ZZK's repositories

Language:CudaStargazers:12Issues:0Issues:0
Language:PythonStargazers:8Issues:0Issues:0
Language:PythonStargazers:2Issues:0Issues:0
Language:C++Stargazers:1Issues:0Issues:0

mmyolo

OpenMMLab YOLO series toolbox and benchmark

Language:PythonLicense:GPL-3.0Stargazers:1Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0

oneflow

OneFlow is a performance-centered and open-source deep learning framework.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

License:Apache-2.0Stargazers:0Issues:0Issues:0

Awesome-System-for-Machine-Learning

A curated list of research in machine learning systems (MLSys). Paper notes are also provided.

License:MITStargazers:0Issues:0Issues:0

CacheLib

Pluggable in-process caching engine to build and scale high performance services

License:Apache-2.0Stargazers:0Issues:0Issues:0

cmake-examples

Useful CMake Examples

License:MITStargazers:0Issues:0Issues:0

Cpp-Concurrency-in-Action-2ed

C++11/14/17/20 multithreading, involving operating system principles and concurrent programming technology.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

License:MITStargazers:0Issues:0Issues:0

DI-engine

OpenDILab Decision AI Engine

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

fast_io

Significantly faster input/output for C++20

License:MITStargazers:0Issues:0Issues:0

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

License:NOASSERTIONStargazers:0Issues:0Issues:0

FBTT-Embedding

This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.

License:MITStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

free-programming-books

:books: Freely available programming books

License:NOASSERTIONStargazers:0Issues:0Issues:0

MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

Stargazers:0Issues:0Issues:0

openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

Stargazers:0Issues:0Issues:0

Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

PaddleClas

A treasure chest for visual classification and recognition powered by PaddlePaddle

License:Apache-2.0Stargazers:0Issues:0Issues:0

PaddleFleetX

Paddle Distributed Training Examples. 飞桨分布式训练示例 Resnet Bert GPT MOE DataParallel ModelParallel PipelineParallel HybridParallel AutoParallel Zero Sharding Recompute GradientMerge Offload AMP DGC LocalSGD Wide&Deep

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

PaddleNLP

👑 Easy-to-use and powerful NLP library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis and 🖼 Diffusion AICG system etc.

License:Apache-2.0Stargazers:0Issues:0Issues:0

powersgd

Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727

License:MITStargazers:0Issues:0Issues:0

TensorRT

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

License:Apache-2.0Stargazers:0Issues:0Issues:0

YHs_Sample

Yinghan's Code Sample

License:GPL-3.0Stargazers:0Issues:0Issues:0