spillai / nos

⚡️ Nitrous oxide for your AI infrastructure.

Home Page:https://docs.nos.run/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nitrous Oxide for your AI Infrastructure

PyPI Version PyPI Version PyPI Downloads PyPi Downloads
Discord PyPi Version

Website | Docs | Discord

⚡️ What is NOS?

NOS (torch-nos) is a fast and flexible Pytorch inference server, specifically designed for optimizing and running inference of popular foundational AI models.

  • 👩‍💻 Easy-to-use: Built for PyTorch and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
  • 🥷 Flexible: Run and serve several foundational AI models (Stable Diffusion, CLIP, Whisper) in a single place.
  • 🔌 Pluggable: Plug your front-end to NOS with out-of-the-box high-performance gRPC/REST APIs, avoiding all kinds of ML model deployment hassles.
  • 🚀 Scalable: Optimize and scale models easily for maximum HW performance without a PhD in ML, distributed systems or infrastructure.
  • 📦 Extensible: Easily hack and add custom models, optimizations, and HW-support in a Python-first environment.
  • ⚙️ HW-accelerated: Take full advantage of your underlying HW (GPUs, ASICs) without compromise.
  • ☁️ Cloud-agnostic: Run on any cloud HW (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.

NOS inherits its name from Nitrous Oxide System, the performance-enhancing system typically used in racing cars. NOS is designed to be modular and easy to extend.

🚀 Getting Started

Get started with the full NOS server by installing via pip:

$ conda env create -n nos-py38 python=3.8
$ conda activate nos-py38
$ conda install pytorch>=2.0.1 torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
$ pip install torch-nos[server]

If you want to simply use a light-weight NOS client and run inference on your local machine (via docker), you can install the client-only package:

$ conda env create -n nos-py38 python=3.8
$ conda activate nos-py38
$ pip install torch-nos

🔥 Quickstart / Show me the code

Image Generation as-a-Service

gRPC API ⚡ REST API
from nos.client import Client

client = Client("[::]:50051")

sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
image, = sdxl(prompts=["fox jumped over the moon"],
              width=1024, height=1024, num_images=1)
curl \
-X POST http://localhost:8000/infer \
-H 'Content-Type: application/json' \
-d '{
      "model_id": "stabilityai/stable-diffusion-xl-base-1-0",
      "inputs": {
          "prompts": ["fox jumped over the moon"],
          "width": 1024,
          "height": 1024,
          "num_images": 1
      }
    }'

Text & Image Embedding-as-a-Service (CLIP-as-a-Service)

gRPC API ⚡ REST API
from nos.client import Client

client = Client("[::]:50051")

clip = client.Module("openai/clip-vit-base-patch32")
txt_vec = clip.encode_text(text=["fox jumped over the moon"])
curl \
-X POST http://localhost:8000/infer \
-H 'Content-Type: application/json' \
-d '{
      "model_id": "openai/clip-vit-base-patch32",
      "method": "encode_text",
      "inputs": {
          "texts": ["fox jumped over the moon"]
      }
    }'

📂 Directory Structure

├── docker         # Dockerfile for CPU/GPU servers
├── docs           # mkdocs documentation
├── examples       # example guides, jupyter notebooks, demos
├── makefiles      # makefiles for building/testing
├── nos
│   ├── cli        # CLI (hub, system)
│   ├── client     # gRPC / REST client
│   ├── common     # common utilities
│   ├── executors  # runtime executor (i.e. Ray)
│   ├── hub        # hub utilies
│   ├── managers   # model manager / multiplexer
│   ├── models     # model zoo
│   ├── proto      # protobuf defs for NOS gRPC service
│   ├── server     # server backend (gRPC)
│   └── test       # pytest utilities
├── requirements   # requirement extras (server, docs, tests)
├── scripts        # basic scripts
└── tests          # pytests (client, server, benchmark)

📚 Documentation

🛣 Roadmap

HW / Cloud Support

  • Commodity GPUs

    • NVIDIA GPUs (20XX, 30XX, 40XX)
    • AMD GPUs (RX 7000)
  • Cloud GPUs

    • NVIDIA (H100, A100, A10G, A30G, T4, L4)
    • AMD (MI200, MI250)
  • Cloud Service Providers (via SkyPilot)

    • AWS, GCP, Azure
    • Opinionated Cloud: Lambda Labs, RunPod, etc
  • Cloud ASICs

📄 License

This project is licensed under the Apache-2.0 License.

📡 Telemetry

NOS collects anonymous usage data using Sentry. This is used to help us understand how the community is using NOS and to help us prioritize features. You can opt-out of telemetry by setting NOS_TELEMETRY_ENABLED=0.

🤝 Contributing

We welcome contributions! Please see our contributing guide for more information.

🔗 Quick Links


<style> .md-typeset h1, .md-content__button { display: none; } </style>

About

⚡️ Nitrous oxide for your AI infrastructure.

https://docs.nos.run/

License:Apache License 2.0


Languages

Language:Python 97.2%Language:Makefile 2.3%Language:Jinja 0.3%Language:Shell 0.2%