Jsson Xia's repositories
awesome-gemm
A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software.
cs149-parallel-computing
Stanford CS149 Parallel Computing
lightneuron
An educational inference framwork.
gemm-kernel-microbenchmark
A microbenchmark for GEMM kernels on NVIDIA GPUs with Ampere Architecture.
hands-on-simd-programming
Hands-on SIMD Programming with C++.
leakcheck
Memory leak detector (MLD) for C applications.
nlp-with-spark
Insight Mastodon: NLP Analysis with Spark
algo-playground
optimize to push the limits.
asst1
Stanford CS149 -- Assignment 1
asst2
Stanford CS149 -- Assignment 2
asst4
Stanford CS149 -- Assignment 4
ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
compute-benchmarks
Compute Benchmarks for oneAPI Level Zero and OpenCL™ Driver
compute-runtime
Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
cutlass
CUDA Templates for Linear Algebra Subroutines
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
hatchet
Graph-indexed Pandas DataFrames for analyzing hierarchical performance data
level-zero
oneAPI Level Zero Specification Headers and Loader
lz77
LZ77 in C.
oneAPI-samples
Samples for Intel® oneAPI Toolkits
pti-gpu
Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
RookieDB
Berkeley CS186: Introduction to Database Systems
smith-waterman
Pairwise sequence alignment algorithm.
tapa
TAPA is a dataflow HLS framework that features fast compilation, expressive programming model and generates high-frequency FPGA accelerators.