technologiestiftung / parla-api

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

All Contributors

Parla (api & database)

This is a the api and database for the explorational project Parla. This is not production ready. Currently we explore if we can make the parliamentary documentation provided by the "The Abgeordnetenhaus" of Berlin as open data https://www.parlament-berlin.de/dokumente/open-data more accessible by embedding all the data and do search it using vector similarity search. The project is heavily based on this example from the supabase community. Built with Fastify and deployed to render.com using docker.

Prerequisites

Needed Environment Variables

See also .envrc.sample. (Might be more up to date).

export SUPABASE_URL="http://localhost:54321"
export SUPABASE_ANON_KEY="ey..."
# Get your key at https://platform.openai.com/account/api-keys
export OPENAI_KEY="sk-UY..."
export SUPABASE_SERVICE_ROLE_KEY=
# in dev we can use a lesser version to save some coins
export OPENAI_MODEL="gpt-3.5-turbo"
export PORT="8080"
export OPENAI_EMBEDDING_MODEL="text-embedding-3-small"
# should be one of "debug", "info", "warning", "error", "critical"
export LOG_LEVEL="info"
# This is only for testing purpose and should not be allowed in production
# for real real!
export DANGEROUSLY_ALLOW_CORS_FOR_ALL_ORIGINS="FOR_REAL_REAL"

Hint. We use direnv for development environment variables. See https://direnv.net/

Installation

npm ci

Deployment

Currently we deploy using docker on render.com.

  • Go to render.com
  • allow render to access your github repository
  • create a new web service (type should be docker)
  • populate the environment variables
  • deploy

Development

Startup a local database:

npx supabase start

Run the API:

npm run dev

Edit the files in src

See also the swagger documentation at http://localhost:8080/documentation/static/index.html

Periodically regenerate indices

The indices on the processed_document_chunks and processed_document_summaries tables need be regenerated upon arrival of new data. This is because the lists parameter should be changed accordingly to https://github.com/pgvector/pgvector. To do this, we use the pg_cron extension available: https://github.com/citusdata/pg_cron. To schedule the regeneration of indices, we create two jobs which use functions defined in the API and database definition: https://github.com/technologiestiftung/parla-api.

select cron.schedule (
    'regenerate_embedding_indices_for_chunks',
    '30 5 * * *',
    $$ SELECT * from regenerate_embedding_indices_for_chunks() $$
);

select cron.schedule (
    'regenerate_embedding_indices_for_summaries',
    '30 5 * * *',
    $$ SELECT * from regenerate_embedding_indices_for_summaries() $$
);

Feedback Feature

To have feedback types and tags in the initial version you can use this snippet

INSERT INTO feedbacks (kind, tag)
		values('positive', NULL), ('negative', 'Antwort inhaltlich falsch oder missverständlich'), ('negative', 'Es gab einen Fehler'), ('negative', 'Antwort nicht ausführlich genug'), ('negative', 'Dokumente unpassend');

It is also present in the supabase/seed.sql

Tests

npm t

Contributing

Before you create a pull request, write an issue so we can discuss your changes.

Contributors

Thanks goes to these wonderful people (emoji key):

Fabian Morón Zirfas
Fabian Morón Zirfas

💻 🚇 🎨 📖
Jonas Jaszkowic
Jonas Jaszkowic

💻 🤔 📖
Ingo Hinterding
Ingo Hinterding

📆 💻 🤔

This project follows the all-contributors specification. Contributions of any kind welcome!

Credits

Made by

A project by

Supported by

Related Projects

About

License:MIT License


Languages

Language:TypeScript 75.1%Language:PLpgSQL 22.2%Language:JavaScript 1.7%Language:Dockerfile 0.6%Language:Just 0.4%