knowledge-retrieval-api

Standalone Knowledge Retrieval API Server to be used with Rubra

Development

Run in development mode (hot-reloading): make run-dev (Requires docker and compose)
Dependency Management: uv
Linting & Formatting: ruff

File Types

Currently, the following file types are supported for ingestion via llama-index' SimpleDirectoryReader interface:

.csv - comma-separated values
.docx - Microsoft Word
.epub - EPUB ebook format
.hwp - Hangul Word Processor
.ipynb - Jupyter Notebook
.jpeg, .jpg - JPEG image
.mbox - MBOX email archive
.md - Markdown
.mp3, .mp4 - audio and video
.pdf - Portable Document Format
.png - Portable Network Graphics
.ppt, .pptm, .pptx - Microsoft PowerPoint

Examples

You can use the GPTScript example in the examples/ directory to test the ingestion and querying parts of the API. The GPTScript will do the following:

Ingest the llama2 Paper located as examples/data/llama2.pdf (only if it hasn't been ingested before)
Query the Dataset to tell us something about the topics "Truthfulness, Toxicity, and Bias"

The returned response should contain a reference to the source page.

Just run this from the repository root:

make run-dev # if you haven't already

# Create the dataset
curl -X 'POST' \
  'http://localhost:8000/datasets/create' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "llama2",
  "embed_dim": 0
}'

# Run the GPTScript example
gptscript examples/example.gpt

About

Standalone Knowledge Retrieval API Server to be used with Rubra

Apache License 2.0

Languages

Language:Python 90.4%Language:Jupyter Notebook 4.2%Language:Shell 2.5%Language:Mako 1.6%Language:Dockerfile 0.9%Language:Makefile 0.4%