abduld

followers

following

stars

abdul dakkak's starred repositories

zsh-autosuggestions

Fish-like autosuggestions for zsh

Language:ShellMIT29986 184 581

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonApache-2.07045 111 146

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonMIT5417 35 859

awesome-generative-ai-guide

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

llama2.mojo

Inference Llama 2 in one file of pure 🔥

Language:MojoMIT2035 27 44

ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.

Language:CMIT1737 20 198

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.01509 32 216

refact

WebUI for Fine-Tuning and Self-hosting of Open-Source Large Language Models for Coding

Language:JavaScriptBSD-3-Clause1461 20 132

arkos

Another rockchip Operating System

Language:ShellMIT1344 46 1081

voltaML

⚡VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM.

Language:PythonApache-2.01191 12 10

distribution

Home of the JELOS Linux distribution.

Language:MakefileNOASSERTION919 250

raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Language:CudaApache-2.0632 25 646

how-to-optimize-gemm

row-major matmul optimization

Language:C++GPL-3.0547 16 13

metal-benchmarks

Apple GPU microarchitecture

Language:MetalMIT324 10 4

YHs_Sample

Yinghan's Code Sample

Language:CudaGPL-3.0240 7 4

SYCLomatic

NOASSERTION210 15 122

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Language:C++Apache-2.0198 7 7

mperf

mperf是一个面向移动/嵌入式平台的算子性能调优工具箱

Language:C++Apache-2.0165 7 14

NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

Language:Cuda164 2 3

Hands-on-GEMM

Language:CudaGPL-3.079 2 3

Elemental

Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction

Language:C++NOASSERTION64 16 24

DissectingTensorCores

Language:Cuda62 3 4

NBAssembler

Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.

Language:PythonMIT58 3 1

gccontent-benchmark

Benchmarking different languages for a simple bioinformatics task (Counting the GC fraction of DNA in a FASTA file)

Language:RustMIT55 2 4

amd_matrix_instruction_calculator

A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators

Language:PythonMIT51 8 1

LLAIR

Library to manipulate Apple Metal Shading Language IR

Language:C++NOASSERTION45 4 1

CUDAMicroBench

Language:CNOASSERTION32 40

llvm-tutor

A collection of out-of-tree LLVM passes for teaching and learning

Language:C++MIT17 10

MojoPkgWorkflow

This Repository shows how to use a simple GitHub Action script for compiling a mojo directory into a package.

CLASP

CoLumn-vector pruning-Aware SPmm kernel

Language:CudaApache-2.04 10