This is a the api and database for the explorational project Parla. This is not production ready. Currently we explore if we can make the parliamentary documentation provided by the "The Abgeordnetenhaus" of Berlin as open data https://www.parlament-berlin.de/dokumente/open-data more accessible by embedding all the data and do search it using vector similarity search. The project is heavily based on this example from the supabase community. Built with Fastify and deployed to render.com using docker.
- docker
- vercel.com account
- supabase.com account
- running instance of the related frontend https://github.com/technologiestiftung/parla-frontend
- running instance of the database, defined in ./supabase
- populated database. Using these tools https://github.com/technologiestiftung/parla-document-processor
See also .envrc.sample
. (Might be more up to date).
export SUPABASE_URL="http://localhost:54321"
export SUPABASE_ANON_KEY="ey..."
# Get your key at https://platform.openai.com/account/api-keys
export OPENAI_KEY="sk-UY..."
export SUPABASE_SERVICE_ROLE_KEY=
# in dev we can use a lesser version to save some coins
export OPENAI_MODEL="gpt-3.5-turbo"
export PORT="8080"
export OPENAI_EMBEDDING_MODEL="text-embedding-3-small"
# should be one of "debug", "info", "warning", "error", "critical"
export LOG_LEVEL="info"
# This is only for testing purpose and should not be allowed in production
# for real real!
export DANGEROUSLY_ALLOW_CORS_FOR_ALL_ORIGINS="FOR_REAL_REAL"
Hint. We use direnv
for development environment variables. See https://direnv.net/
npm ci
Currently we deploy using docker on render.com.
- Go to render.com
- allow render to access your github repository
- create a new web service (type should be docker)
- populate the environment variables
- deploy
Startup a local database:
npx supabase start
Run the API:
npm run dev
Edit the files in src
See also the swagger documentation at http://localhost:8080/documentation/static/index.html
The indices on the processed_document_chunks
and processed_document_summaries
tables need be regenerated upon arrival of new data.
This is because the lists
parameter should be changed accordingly to https://github.com/pgvector/pgvector. To do this, we use the pg_cron
extension available: https://github.com/citusdata/pg_cron. To schedule the regeneration of indices, we create two jobs which use functions defined in the API and database definition: https://github.com/technologiestiftung/parla-api.
select cron.schedule (
'regenerate_embedding_indices_for_chunks',
'30 5 * * *',
$$ SELECT * from regenerate_embedding_indices_for_chunks() $$
);
select cron.schedule (
'regenerate_embedding_indices_for_summaries',
'30 5 * * *',
$$ SELECT * from regenerate_embedding_indices_for_summaries() $$
);
To have feedback types and tags in the initial version you can use this snippet
INSERT INTO feedbacks (kind, tag)
values('positive', NULL), ('negative', 'Antwort inhaltlich falsch oder missverständlich'), ('negative', 'Es gab einen Fehler'), ('negative', 'Antwort nicht ausführlich genug'), ('negative', 'Dokumente unpassend');
It is also present in the supabase/seed.sql
npm t
Before you create a pull request, write an issue so we can discuss your changes.
Thanks goes to these wonderful people (emoji key):
Fabian Morón Zirfas 💻 🚇 🎨 📖 |
Jonas Jaszkowic 💻 🤔 📖 |
Ingo Hinterding 📆 💻 🤔 |
This project follows the all-contributors specification. Contributions of any kind welcome!
Made by
|
A project by
|
Supported by
|