tbliu/nos

⚡️ What is NOS?

NOS (torch-nos) is a fast and flexible Pytorch inference server, specifically designed for optimizing and running inference of popular foundational AI models.

👩‍💻 Easy-to-use: Built for PyTorch and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
🥷 Flexible: Run and serve several foundational AI models (Stable Diffusion, CLIP, Whisper) in a single place.
🔌 Pluggable: Plug your front-end to NOS with out-of-the-box high-performance gRPC/REST APIs, avoiding all kinds of ML model deployment hassles.
🚀 Scalable: Optimize and scale models easily for maximum HW performance without a PhD in ML, distributed systems or infrastructure.
📦 Extensible: Easily hack and add custom models, optimizations, and HW-support in a Python-first environment.
⚙️ HW-accelerated: Take full advantage of your underlying HW (GPUs, ASICs) without compromise.
☁️ Cloud-agnostic: Run on any cloud HW (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.

NOS inherits its name from Nitrous Oxide System, the performance-enhancing system typically used in racing cars. NOS is designed to be modular and easy to extend.

🚀 Getting Started

Get started with the full NOS server by installing via pip:

$ conda create -n nos-py38 python=3.8
$ conda activate nos-py38
$ conda install pytorch>=2.0.1 torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
$ pip install torch-nos[server]

If you want to simply use a light-weight NOS client and run inference on your local machine (via docker), you can install the client-only package:

$ conda create -n nos-py38 python=3.8
$ conda activate nos-py38
$ pip install torch-nos

For a more detailed quickstart, navigate to our quickstart docs.

🔥 Quickstart / Show me the code

⚡️ Start the GPU server

The quickest way to get started is to start the GPU server. The --http flag optionally starts an HTTP gateway server so that you can run the REST API examples. We recommend you test out the gRPC client API to get the most out-of-the-box performance.

nos serve up --http

This command pulls and starts the latest GPU docker server with all the NOS goodies, without you requiring to manually do any setup. You'll see a bunch of debug logs on the console, wait until you see Uvicorn running on http://0.0.0.0:8000 before continuing to the next section. To follow the remaining examples, start a new terminal (leaving the server running in the background).

🏞️ Image Generation (Stable-Diffusion-as-a-Service)

gRPC API ⚡

REST API

from nos.client import Client

client = Client("[::]:50051")

sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
image, = sdxl(prompts=["fox jumped over the moon"],
              width=1024, height=1024, num_images=1)

curl \
-X POST http://localhost:8000/v1/infer \
-H 'Content-Type: application/json' \
-d '{
    "model_id": "stabilityai/stable-diffusion-xl-base-1-0",
    "inputs": {
        "prompts": ["fox jumped over the moon"],
        "width": 1024,
        "height": 1024,
        "num_images": 1
    }
}'

🧠 Text & Image Embedding (CLIP-as-a-Service)

gRPC API ⚡

REST API

from nos.client import Client

client = Client("[::]:50051")

clip = client.Module("openai/clip-vit-base-patch32")
txt_vec = clip.encode_text(text=["fox jumped over the moon"])

curl \
-X POST http://localhost:8000/v1/infer \
-H 'Content-Type: application/json' \
-d '{
    "model_id": "openai/clip-vit-base-patch32",
    "method": "encode_text",
    "inputs": {
        "texts": ["fox jumped over the moon"]
    }
}'

🎙️ Audio Transcription (Whisper-as-a-Service)

gRPC API ⚡

REST API

from pathlib import Path
from nos.client import Client

client = Client("[::]:50051")

model = client.Module("openai/whisper-large-v2")
with client.UploadFile(Path("audio.wav")) as remote_path:
  response = model(path=remote_path)
# {"chunks": ...}

curl \
-X POST http://localhost:8000/v1/infer/file \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'model_id=openai/whisper-large-v2' \
-F 'file=@audio.wav'

🧐 Object Detection (YOLOX-as-a-Service)

gRPC API ⚡

REST API

from pathlib import Path
from nos.client import Client

client = Client("[::]:50051")

model = client.Module("yolox/medium")
response = model(images=[Image.open("image.jpg")])
# {"bboxes": ..., "scores": ..., "labels": ...}

curl \
-X POST http://localhost:8000/v1/infer/file \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'model_id=yolox/medium' \
-F 'file=@image.jpg'

🗂️ Directory Structure

├── docker         # Dockerfile for CPU/GPU servers
├── docs           # mkdocs documentation
├── examples       # example guides, jupyter notebooks, demos
├── makefiles      # makefiles for building/testing
├── nos
│   ├── cli        # CLI (hub, system)
│   ├── client     # gRPC / REST client
│   ├── common     # common utilities
│   ├── executors  # runtime executor (i.e. Ray)
│   ├── hub        # hub utilies
│   ├── managers   # model manager / multiplexer
│   ├── models     # model zoo
│   ├── proto      # protobuf defs for NOS gRPC service
│   ├── server     # server backend (gRPC)
│   └── test       # pytest utilities
├── requirements   # requirement extras (server, docs, tests)
├── scripts        # basic scripts
└── tests          # pytests (client, server, benchmark)

📚 Documentation

Quickstart
Models
Concepts: Architecture Overview, ModelSpec, ModelManager, Runtime Environments
Demos: Building a Discord Image Generation Bot, Video Search Demo

🛣 Roadmap

HW / Cloud Support

📄 License

This project is licensed under the Apache-2.0 License.

📡 Telemetry

NOS collects anonymous usage data using Sentry. This is used to help us understand how the community is using NOS and to help us prioritize features. You can opt-out of telemetry by setting NOS_TELEMETRY_ENABLED=0.

🤝 Contributing

We welcome contributions! Please see our contributing guide for more information.

🔗 Quick Links

💬 Send us an email at support@autonomi.ai or join our Discord for help.
📣 Follow us on Twitter, and LinkedIn to keep up-to-date on our products.

tbliu / nos

⚡️ What is NOS?

🚀 Getting Started

🔥 Quickstart / Show me the code

⚡️ Start the GPU server

🏞️ Image Generation (Stable-Diffusion-as-a-Service)

🧠 Text & Image Embedding (CLIP-as-a-Service)

🎙️ Audio Transcription (Whisper-as-a-Service)

🧐 Object Detection (YOLOX-as-a-Service)

🗂️ Directory Structure

📚 Documentation

🛣 Roadmap

HW / Cloud Support

📄 License

📡 Telemetry

🤝 Contributing

🔗 Quick Links

About

Languages