LLM-On-Prem

Introduction

The frontend is a boilerplate app generated by npx create-secure-chatgpt-app. It leverages several of Pangea's security services for securing ChatGPT usage.

Features

Frontend:
- Authentication using Pangea AuthN service and NextJS framework
- Secure chat page
- Secure chat/generate api endpoint
- Redacting user prompts using Pangea's Redact service
- Auditing user prompts using Pangea's Secure Audit Log service
- De-fanging malicious responses from OpenAI API using Pangea's Domain Intel and URL Intel services
Backend:
- lm-sys FastChat server configured to operate as a stand-in for the OpenAI API
- VLLM inference engine worker
- Supports multiple HF transformer models

Installation

Once you've cloned the repo, perform the following steps to get the app up and running:

Set-up your local environment file
```
cp .env.example .env
```
Update relevant keys and variables with those of your own

Note: If running docker desktop on Windows or Mac you will likely need to increase your docker memory allocation to at least 16GB and max out your CPU allocation

Build and Run the containers:
- CPU:
  - Build docker compose --profile cpu build
  - Run docker compose --profile cpu up
- **CUDA:** - Build `docker compose --profile cuda build` - Run `docker compose --profile cuda up`
- **OpenVINO:** - Build `docker compose --profile openvino build` - Run `docker compose --profile openvino up`
- **Metal:** - Build `docker compose --profile metal build` - Run `docker compose --profile metal up`
- **GGML:** - Build `docker compose --profile ggml build` - Run `docker compose --profile ggml up`

About

Languages

Language:TypeScript 79.3%Language:Dockerfile 10.6%Language:Shell 6.9%Language:JavaScript 2.6%Language:Python 0.4%Language:CSS 0.1%