Bert Maher's repositories

llama2.so

Inference Llama 2 with a model compiled to native code by TorchInductor

Language:C++License:MITStargazers:8Issues:0Issues:0

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:C++License:NOASSERTIONStargazers:3Issues:4Issues:0

tf32_gemm

Example of binding a TF32 CUTLASS GEMM kernel to PyTorch

Language:PythonStargazers:3Issues:0Issues:0

bitserial

Hacking around with ultra-low precision GEMM using TVM

Language:LLVMStargazers:2Issues:3Issues:0
Language:C++Stargazers:2Issues:2Issues:0
Language:PythonLicense:MITStargazers:1Issues:1Issues:0

pyhpc-benchmarks

A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python :rocket:

Language:PythonLicense:UnlicenseStargazers:1Issues:0Issues:0
Language:HTMLStargazers:0Issues:0Issues:0

Background-Matting

Background Matting: The World is Your Green Screen

Language:PythonStargazers:0Issues:1Issues:0
Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:1Issues:0

BERT-pytorch

Google AI 2018 BERT pytorch implementation

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

bertmaher.github.io

My GitHub site

Language:HTMLStargazers:0Issues:2Issues:0

builder

Continuous builder and binary build scripts for pytorch

Language:ShellLicense:BSD-2-ClauseStargazers:0Issues:0Issues:0
Language:C++Stargazers:0Issues:0Issues:0

ds2

Debug server for lldb.

Language:C++License:NOASSERTIONStargazers:0Issues:2Issues:0

fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

functorch

functorch is a prototype of JAX-like composable function transforms for PyTorch.

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

glow

Compiler for Neural Network hardware accelerators

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

hub

Submission to https://pytorch.org/hub/

Language:PythonStargazers:0Issues:0Issues:0

learn_cuda

Simple programs for learning CUDA

Language:CudaStargazers:0Issues:0Issues:0

lmdave

Let's Make: Dangerous Dave

Language:CStargazers:0Issues:1Issues:0

maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

midwit-matmul

A simplistic approach to high-performance GPU matmul

Language:CudaStargazers:0Issues:0Issues:0

multipy

torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters in a single C++ process.

License:NOASSERTIONStargazers:0Issues:0Issues:0

nvprof2json

Convert nvprof profiles into about:tracing compatible JSON files

Language:PythonStargazers:0Issues:1Issues:0

resume

The LaTeX sources for my resume/CV

Language:TeXStargazers:0Issues:2Issues:0

torchdynamo

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

triton

Development repository for the Triton language and compiler

License:MITStargazers:0Issues:0Issues:0

tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

tvm-1

TVM integration into PyTorch

Language:C++Stargazers:0Issues:1Issues:0