abdul dakkak's starred repositories

zsh-autosuggestions

Fish-like autosuggestions for zsh

Language:ShellLicense:MITStargazers:29986Issues:184Issues:581

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonLicense:Apache-2.0Stargazers:7045Issues:111Issues:146

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonLicense:MITStargazers:5417Issues:35Issues:859

awesome-generative-ai-guide

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

llama2.mojo

Inference Llama 2 in one file of pure 🔥

Language:MojoLicense:MITStargazers:2035Issues:27Issues:44

ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:1509Issues:32Issues:216

refact

WebUI for Fine-Tuning and Self-hosting of Open-Source Large Language Models for Coding

Language:JavaScriptLicense:BSD-3-ClauseStargazers:1461Issues:20Issues:132

arkos

Another rockchip Operating System

Language:ShellLicense:MITStargazers:1344Issues:46Issues:1081

voltaML

⚡VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM.

Language:PythonLicense:Apache-2.0Stargazers:1191Issues:12Issues:10

distribution

Home of the JELOS Linux distribution.

Language:MakefileLicense:NOASSERTIONStargazers:919Issues:25Issues:0

raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Language:CudaLicense:Apache-2.0Stargazers:632Issues:25Issues:646

how-to-optimize-gemm

row-major matmul optimization

Language:C++License:GPL-3.0Stargazers:547Issues:16Issues:13

metal-benchmarks

Apple GPU microarchitecture

Language:MetalLicense:MITStargazers:324Issues:10Issues:4

YHs_Sample

Yinghan's Code Sample

Language:CudaLicense:GPL-3.0Stargazers:240Issues:7Issues:4

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Language:C++License:Apache-2.0Stargazers:198Issues:7Issues:7

mperf

mperf是一个面向移动/嵌入式平台的算子性能调优工具箱

Language:C++License:Apache-2.0Stargazers:165Issues:7Issues:14

NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

Language:CudaLicense:GPL-3.0Stargazers:79Issues:2Issues:3

Elemental

Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction

Language:C++License:NOASSERTIONStargazers:64Issues:16Issues:24

NBAssembler

Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.

Language:PythonLicense:MITStargazers:58Issues:3Issues:1

gccontent-benchmark

Benchmarking different languages for a simple bioinformatics task (Counting the GC fraction of DNA in a FASTA file)

Language:RustLicense:MITStargazers:55Issues:2Issues:4

amd_matrix_instruction_calculator

A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators

Language:PythonLicense:MITStargazers:51Issues:8Issues:1

LLAIR

Library to manipulate Apple Metal Shading Language IR

Language:C++License:NOASSERTIONStargazers:45Issues:4Issues:1
Language:CLicense:NOASSERTIONStargazers:32Issues:4Issues:0

llvm-tutor

A collection of out-of-tree LLVM passes for teaching and learning

Language:C++License:MITStargazers:17Issues:1Issues:0

MojoPkgWorkflow

This Repository shows how to use a simple GitHub Action script for compiling a mojo directory into a package.

CLASP

CoLumn-vector pruning-Aware SPmm kernel

Language:CudaLicense:Apache-2.0Stargazers:4Issues:1Issues:0