Yujia Zhai's repositories
Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
Optimizing-SGEMV-on-NVIDIA-GPUs
An implementation of SGEMV with performance comparable to cuBLAS.
Optimizing-DGEMV-on-Intel-CPUs
Highly optimized DGEMV on CPU with both serial and parallel performance better than MKL and OpenBLAS.
FasterTransformer
Transformer related optimization, including BERT, GPT
TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
alshedivat-al-folio
A beautiful, simple, clean, and responsive Jekyll theme for academics
effective_transformer
Running BERT without Padding
HElib
HElib is an open-source software library that implements homomorphic encryption. It supports the BGV scheme with bootstrapping and the Approximate Number CKKS scheme. HElib also includes optimizations for efficient homomorphic evaluation, focusing on effective use of ciphertext packing techniques and on the Gentry-Halevi-Smart optimizations.
libfacedetection
An open source library for face detection in images. The face detection speed can reach 1000FPS.
ML2021-Spring
**Official** 李宏毅 (Hung-yi Lee) 機器學習 Machine Learning 2021 Spring
yzhaiustc.github.io
My personal website