OctoAI's repositories
octoml-profile
Home for OctoML PyTorch Profiler
triton-client-rs
A client library in Rust for Nvidia Triton.
octoml-llm-qa
A code sample that shows how to use 🦜️🔗langchain, 🦙llama_index and a hosted LLM endpoint to do a standard chat or Q&A about a pdf document
dockercon23-octoai
DockerCon 2023 OctoAI AI/ML Workshop GitHub Repo
octoai-apps
A collection of OctoAI-based demos.
hackathon-2023-rag
OctoAI 2023 Llama2 RAG demos
octoai-cartoonizer
Cartoonizer demo for OctoAI compute service launch
octoai-launch-examples
Examples of how to build Generative AI applications powered by the OctoAI compute service.
archived_vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
homebrew-tap
Homebrew Tap of OctoML products and tools.
octoai-octoshop
OctoAI's OctoShop! Transform photos with the power of words and generative AI!
relax-all
A fork of tvm/unity
stable-diffusion-webui-docker
Easy Docker setup for Stable Diffusion with user-friendly UI
TensorRT-LLM-release
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
triton-inference-server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
use-whisper
React hook for OpenAI Whisper with speech recorder, real-time transcription, and silence removal built-in
web-llm
Bringing large-language models and chat to web browsers. Everything runs inside the browser with no server support.