ZZK (MARD1NO)

MARD1NO

Geek Repo

Company:OD

Location:Everywhere

Home Page:https://www.zhihu.com/people/mardino

Github PK Tool:Github PK Tool

ZZK's repositories

CacheLib

Pluggable in-process caching engine to build and scale high performance services

License:Apache-2.0Stargazers:0Issues:0Issues:0

FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

License:NOASSERTIONStargazers:0Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

License:MITStargazers:0Issues:0Issues:0

YHs_Sample

Yinghan's Code Sample

License:GPL-3.0Stargazers:0Issues:0Issues:0

FBTT-Embedding

This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.

License:MITStargazers:0Issues:0Issues:0

data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

Stargazers:0Issues:0Issues:0

Cpp-Concurrency-in-Action-2ed

C++11/14/17/20 multithreading, involving operating system principles and concurrent programming technology.

License:Apache-2.0Stargazers:0Issues:0Issues:0

powersgd

Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727

License:MITStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

DeepRec

DeepRec is a recommendation engine based on TensorFlow.

License:Apache-2.0Stargazers:0Issues:0Issues:0

cuda-training-series

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Stargazers:0Issues:0Issues:0

NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

Stargazers:0Issues:0Issues:0
Language:C++Stargazers:1Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Language:Jupyter NotebookStargazers:4Issues:0Issues:0

tensorflow-internals

It is open source ebook about TensorFlow kernel and implementation mechanism.

Stargazers:0Issues:0Issues:0
Stargazers:1Issues:0Issues:0

AI-System

System for AI Education Resource.

License:CC-BY-4.0Stargazers:0Issues:0Issues:0

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

License:NOASSERTIONStargazers:0Issues:0Issues:0

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

License:NOASSERTIONStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Language:CudaStargazers:17Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Language:C++Stargazers:0Issues:0Issues:0
License:BSL-1.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

DesignPattern

C++11全套设计模式-23种指针的用法(a full DesignPattern implement with c++11)

License:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0