Shintaro Iwasaki (shintaro-iwasaki)

shintaro-iwasaki

Geek Repo

Home Page:https://shintaro-iwasaki.github.io/

Github PK Tool:Github PK Tool

Shintaro Iwasaki's repositories

Language:ShellStargazers:2Issues:0Issues:0

triton

Development repository for the Triton language and compiler

Language:C++License:MITStargazers:1Issues:0Issues:0

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

argobots

Copy of Argobots Repository

Language:CLicense:NOASSERTIONStargazers:0Issues:0Issues:0

bolt

Official BOLT Repository

Language:C++License:NOASSERTIONStargazers:0Issues:2Issues:3

daos

DAOS Storage Engine

Language:CLicense:NOASSERTIONStargazers:0Issues:1Issues:0
Language:CLicense:NOASSERTIONStargazers:0Issues:0Issues:0

FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Language:C++License:NOASSERTIONStargazers:0Issues:1Issues:0
Language:C++License:NOASSERTIONStargazers:0Issues:1Issues:0

FBTT-Embedding

This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.

Language:CudaLicense:MITStargazers:0Issues:0Issues:0

folly

An open-source C++ library developed and used at Facebook.

License:Apache-2.0Stargazers:0Issues:0Issues:0

jekyll-action

A GitHub Action to publish Jekyll based content as a GitHub Pages site

Language:ShellLicense:MITStargazers:0Issues:1Issues:0

kineto

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.

Language:HTMLLicense:NOASSERTIONStargazers:0Issues:0Issues:0

llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

License:NOASSERTIONStargazers:0Issues:0Issues:0
Language:ShellStargazers:0Issues:1Issues:1
Language:FortranLicense:LGPL-3.0Stargazers:0Issues:2Issues:0

mpich

Official MPICH Repository

Language:CLicense:NOASSERTIONStargazers:0Issues:0Issues:0

ompi

Open MPI main development repository

Language:CLicense:NOASSERTIONStargazers:0Issues:0Issues:0
Language:CLicense:NOASSERTIONStargazers:0Issues:0Issues:0

optimizers

For optimization algorithm research and development.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

osu-abt

OSU Micro-Benchmarks 5.7 + Argobots

Language:CLicense:NOASSERTIONStargazers:0Issues:2Issues:0

p2s2-www

International Workshop on Parallel Programming Models and Systems Software for High-End Computing Website

Language:HTMLStargazers:0Issues:0Issues:0

ppopp21-preemption-artifact

Artifact of the paper "Lightweight Preemptive User-Level Threads" in PPoPP'21

Language:C++Stargazers:0Issues:1Issues:0

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

License:NOASSERTIONStargazers:0Issues:0Issues:0

qthreads

Lightweight locality-aware user-level threading runtime.

Language:CLicense:NOASSERTIONStargazers:0Issues:1Issues:0

rccl-tests

RCCL Performance Benchmark Tests

Language:CudaLicense:NOASSERTIONStargazers:0Issues:0Issues:0
Language:HTMLStargazers:0Issues:0Issues:0

spack

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

triton-shared

Shared Middle-Layer for Triton Compilation

License:MITStargazers:0Issues:0Issues:0

yaksa-www

Yaksa: High-performance Noncontiguous Data Management

Stargazers:0Issues:0Issues:0