There are 22 repositories under inference-engine topic.
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
Rule engine implementation in Golang
OneDiff: An out-of-the-box acceleration library for diffusion models.
Large-scale LLM inference engine
FeatherCNN is a high performance inference engine for convolutional neural networks.
Paddle.js is a web project for Baidu PaddlePaddle, which is an open source deep learning framework running in the browser. Paddle.js can either load a pre-trained model, or transforming a model from paddle-hub with model transforming tools provided by Paddle.js. It could run in every browser with WebGL/WebGPU/WebAssembly supported. It could also run in Baidu Smartprogram and WX miniprogram.
The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
🔥 (yolov3 yolov4 yolov5 unet ...)A mini pytorch inference framework which inspired from darknet.
Python Computer Vision & Video Analytics Framework With Batteries Included
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
A common base representation of python source code for pylint and other projects
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching
Julia package for automated Bayesian inference on a factor graph with reactive message passing
The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
PyTorch library for cost-effective, fast and easy serving of MoE models.
Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.
docs for search system and ai infra
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
This is a repository for an object detection inference API using the Tensorflow framework.
implement GPT-OSS 20B & 120B C++ inference from scratch on AMD GPUs
A robust and efficient TinyML inference engine.
A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.