thedadams / knowledge-retrieval-api

Standalone Knowledge Retrieval API Server to be used with Rubra

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

knowledge-retrieval-api

Standalone Knowledge Retrieval API Server to be used with Rubra

Development

  • Run in development mode (hot-reloading): make run-dev (Requires docker and compose)
  • Dependency Management: uv
  • Linting & Formatting: ruff

File Types

Currently, the following file types are supported for ingestion via llama-index' SimpleDirectoryReader interface:

  • .csv - comma-separated values
  • .docx - Microsoft Word
  • .epub - EPUB ebook format
  • .hwp - Hangul Word Processor
  • .ipynb - Jupyter Notebook
  • .jpeg, .jpg - JPEG image
  • .mbox - MBOX email archive
  • .md - Markdown
  • .mp3, .mp4 - audio and video
  • .pdf - Portable Document Format
  • .png - Portable Network Graphics
  • .ppt, .pptm, .pptx - Microsoft PowerPoint

Examples

You can use the GPTScript example in the examples/ directory to test the ingestion and querying parts of the API. The GPTScript will do the following:

  1. Ingest the llama2 Paper located as examples/data/llama2.pdf (only if it hasn't been ingested before)
  2. Query the Dataset to tell us something about the topics "Truthfulness, Toxicity, and Bias"

The returned response should contain a reference to the source page.

Just run this from the repository root:

make run-dev # if you haven't already

# Create the dataset
curl -X 'POST' \
  'http://localhost:8000/datasets/create' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "llama2",
  "embed_dim": 0
}'

# Run the GPTScript example
gptscript examples/example.gpt

About

Standalone Knowledge Retrieval API Server to be used with Rubra

License:Apache License 2.0


Languages

Language:Python 90.4%Language:Jupyter Notebook 4.2%Language:Shell 2.5%Language:Mako 1.6%Language:Dockerfile 0.9%Language:Makefile 0.4%