yiakwy-xpu-ml-framework-team's repositories
NV-nccl-tests
NCCL Tests
NVIDIA-DOCA-App-Code-Sharing
DOCA Application code sharing Contest
NV_grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM for MoE.
DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
FlexFlow
A distributed deep learning framework.
hpc-ipu
Best practice for HPC with IPU backend for scientific/AI(Deep Learning Framework) algorithm and software development
groqflow
GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing those programs on GroqChip™ processors.
IPUDOOM
DOOM (1993) on IPU 👿
k8s-nccl-tests
NVIDIA NCCL Tests for Distributed Training
LightSeq
Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
Longctx_ChunkLlama
Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
Megatron-LM
Ongoing research training transformer models at scale
META-llama3
The official Meta Llama 3 GitHub site
ml_dtypes
A stand-alone implementation of several NumPy dtype extensions used in machine learning.
MS_Pix2Text
An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
NV-gdrcopy
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
NV-nccl-rdma-sharp-plugins
RDMA and SHARP plugins for nccl library
NV_cub_archive
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
NVTX
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
popart-fork
Poplar Advanced Runtime for the IPU
skyworkai-Vitron
A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
triton
Development repository for the Triton language and compiler
yiakwy-xpu-ml-framework-team
Config files for my GitHub profile.