bssrdf's repositories

UnderstandingUnixLinuxProgramming

source code for the book

Language:CStargazers:5Issues:3Issues:0

ggml

Tensor library for machine learning

Language:CLicense:MITStargazers:1Issues:2Issues:0

pyleet

leet code training

Language:PythonStargazers:1Issues:3Issues:0

avx2-examples

Short examples illustrating AVX2 intrinsics for simple tasks.

Language:MakefileLicense:MITStargazers:0Issues:1Issues:0

bcnn

Minimalist Convolutional Neural Networks in C and Cuda

Language:CLicense:MITStargazers:0Issues:3Issues:0

clip.cpp

CLIP inference in plain C/C++ with no extra dependencies

License:MITStargazers:0Issues:0Issues:0

Cpp-Concurrency-in-Action-2ed

C++11/14/17/20 multithreading, involving operating system principles and concurrent programming technology.

License:Apache-2.0Stargazers:0Issues:0Issues:0

cuda-1brc

My CUDA solution to the 1BRC

Language:CudaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

CUDA-Based-Image-Convolution

Developed and optimized a CUDA kernel for 2D convolution, accommodating a 2D input tensor and a 2D filter tensor, with transposed filter application.

Language:CudaStargazers:0Issues:1Issues:0
Language:CudaStargazers:0Issues:0Issues:0

CUDA_gemm

A simple high performance CUDA GEMM implementation.

Stargazers:0Issues:0Issues:0

CUDALibrarySamples

CUDA Library Samples

License:NOASSERTIONStargazers:0Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

generative-ai-for-beginners

12 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/

Language:Jupyter NotebookLicense:MITStargazers:0Issues:1Issues:0

kgraph

A library for k-nearest neighbor search

Language:C++License:BSD-2-ClauseStargazers:0Issues:0Issues:0

LeNet-5_Speed_Up

Utilize OpenMP and CUDA to speed up LeNet-5 digit recognition CNN. In OpneMP, training with 11x speed up and 11x in testing. With the help of CUDA, the training is speed up by 3x and 57x speed up in testing.

Language:CStargazers:0Issues:3Issues:0

llama.cpp

Port of Facebook's LLaMA model in C/C++

Language:C++License:MITStargazers:0Issues:1Issues:0

moderngpu

Design patterns for GPU computing

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0

openCNN

A Winograd Minimal Filter Implementation in CUDA

Language:CudaLicense:Apache-2.0Stargazers:0Issues:1Issues:0

PMPP

Solution of Programming Massively Parallel Processors

Stargazers:0Issues:0Issues:0
Language:CudaStargazers:0Issues:0Issues:0

PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Language:C++License:MITStargazers:0Issues:1Issues:0

screenshot-to-code

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

License:MITStargazers:0Issues:0Issues:0

SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

License:MITStargazers:0Issues:0Issues:0

sshfs

A network filesystem client to connect to SSH servers

License:GPL-2.0Stargazers:0Issues:0Issues:0

stable-diffusion.cpp

Stable Diffusion in pure C/C++

Language:C++License:MITStargazers:0Issues:1Issues:0

stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

udlbook

Understanding Deep Learning - Simon J.D. Prince

License:NOASSERTIONStargazers:0Issues:0Issues:0

x86-simd-sort

C++ template library for high performance SIMD based sorting algorithms

Language:C++License:BSD-3-ClauseStargazers:0Issues:1Issues:0