EvaDB AI-SQL Database System
EvaDB is a database system for building simpler and faster AI-powered applications.
EvaDB is a database system for developing AI apps. We aim to simplify the development and deployment of AI-powered apps that operate on unstructured data (text documents, videos, PDFs, podcasts, etc.) and structured data (tables, vector index).
The high-level Python and SQL APIs allow beginners to use EvaDB in a few lines of code. Advanced users can define custom user-defined functions that wrap around any AI model or Python library. EvaDB is fully implemented in Python and licensed under the Apache license.
Quick Links
Features
๐ฎ Build simpler AI-powered applications using Python functions or SQL queriesโก๏ธ 10x faster applications using AI-centric query optimization๐ฐ Save money spent on GPUs๐ First-class support for your custom deep learning models through user-defined functions๐ฆ Built-in caching to eliminate redundant model invocations across queriesโจ๏ธ First-class support for PyTorch, Hugging Face, YOLO, and Open AI models๐ Installable via pip and fully implemented in Python
Illustrative Applications
Here are some illustrative EvaDB-powered applications (each Jupyter notebook can be opened on Google Colab):
๐ฎ PrivateGPT๐ฎ ChatGPT-based Video Question Answering๐ฎ Querying PDF Documents๐ฎ Analysing Traffic Flow with YOLO- ๐ฎ Examining Emotions of Movie
๐ฎ Image Segmentation with Hugging Face
Documentation
- Documentation
- The Getting Started page shows how you can use EvaDB for different AI tasks and how you can easily extend EvaDB to support your custom deep learning model through user-defined functions.
- The User Guides section contains Jupyter Notebooks that demonstrate how to use various features of EvaDB. Each notebook includes a link to Google Colab, where you can run the code yourself.
- Join us on Slack
- Follow us on Twitter
- Roadmap
Quick Start
- Step 1: Install EvaDB using pip. EvaDB supports Python versions >=
3.8
:
pip install evadb
- Step 2: Write your AI app!
import evadb
# Grab a EvaDB cursor to load data and run queries
cursor = evadb.connect().cursor()
# Load a collection of news videos into the 'news_videos' table
# This command returns a Pandas Dataframe with the query's output
# In this case, the output indicates the number of loaded videos
cursor.load(
file_regex="news_videos/*.mp4",
format="VIDEO",
table_name="news_videos"
).df()
# Define a function that wraps around your deep learning model
# Here, this function wraps around an off-the-shelf speech-to-text (Whisper) model
# Such functions are known as user-defined functions or UDFs
# So, we are creating a Whisper UDF here
# After creating the UDF, we can use the function in any query
cursor.create_udf(
udf_name="SpeechRecognizer",
type="HuggingFace",
task='automatic-speech-recognition',
model='openai/whisper-base'
).df()
# EvaDB automatically extract the audio from the video
# We only need to run the SpeechRecongizer UDF on the 'audio' column
# to get the transcript and persist it in a table called 'transcripts'
cursor.query(
"""CREATE TABLE transcripts AS
SELECT SpeechRecognizer(audio) from news_videos;"""
).df()
# We next incrementally construct the ChatGPT query using EvaDB's Python API
# The query is based on the 'transcripts' table
# This table has a column called 'text' with the transcript text
query = cursor.table('transcripts')
# Since ChatGPT is a built-in function, we don't have to define it
# We can just directly use it in the query
# We need to set the OPENAI_KEY as an environment variable
os.environ["OPENAI_KEY"] = OPENAI_KEY
query = query.select("ChatGPT('Is this video summary related to LLMs', text)")
# Finally, we run the query to get the results as a dataframe
response = query.df()
- Chain multiple models in a single query to set up useful AI pipelines
# Analyse emotions of actors in an Interstellar movie clip using PyTorch models
query = cursor.table("Interstellar")
# Get faces using a `FaceDetector` function
query = query.cross_apply("UNNEST(FaceDetector(data))", "Face(bounding_box, confidence)")
# Focus only on frames 100 through 200 in the clip
query = query.filter("id > 100 AND id < 200")
# Get the emotions of the detected faces using a `EmotionDetector` function
query = query.select("id, bbox, EmotionDetector(Crop(data, bounding_box))")
# Run the query and get the query result as a dataframe
response = query.df()
-
EvaDB runs AI apps 10--100x faster using its AI-centric query optimizer. Three key built-in optimizations are:
๐พ Caching: EvaDB automatically caches and reuses model inference results.
โก๏ธ Parallel Query Execution: EvaDB runs the app in parallel on all the available hardware resources (CPUs and GPUs).๐ฏ Model Ordering: EvaDB optimizes the order in which models are evaluated (e.g., runs the faster, more selective model first).
Architecture Diagram
This diagram presents the key components of EvaDB. EvaDB's AI-centric query optimizer takes a query as input and generates a query plan that is executed by the query engine. The query engine hits the relevant storage engines to quickly retrieve the data required for efficiently running the query:
- Structured data (SQL database system connected via
sqlalchemy
). - Unstructured media data (on cloud buckets/local filesystem).
- Feature data (vector database system).
![Architecture Diagram](https://user-images.githubusercontent.com/5521975/237778889-01452ec9-87d9-4d27-90b2-c0b1ab29b16c.png)
Screenshots
Traffic Analysis (Object Detection Model)
๐ฎSource Video | Query Result |
---|---|
![]() |
![]() |
PDF Question Answering (Question Answering Model)
๐ฎApp |
---|
![]() |
๐ฎ MNIST Digit Recognition (Image Classification Model)
Source Video | Query Result |
---|---|
![]() |
![]() |
๐ฎ Movie Emotion Analysis (Face Detection + Emotion Classification Models)
Source Video | Query Result |
---|---|
![]() |
![]() |
๐ฎ License Plate Recognition (Plate Detection + OCR Extraction Models)
Query Result |
---|
![]() |
Community and Support
![EvaDB Slack Channel](https://raw.githubusercontent.com/georgia-tech-db/eva/master/docs/images/eva/eva-slack.png)
If you run into any problems or issues, please create a Github issue and we'll try our best to help.
Don't see a feature in the list? Search our issue tracker if someone has already requested it and add a comment to it explaining your use-case, or open a new issue if not. We prioritize our roadmap based on user feedback, so we'd love to hear from you.
Contributing
EvaDB is the beneficiary of many contributors. All kinds of contributions to EvaDB are appreciated. To file a bug or to request a feature, please use GitHub issues. Pull requests are welcome.
For more information, see our contribution guide.
License
Copyright (c) 2018-present Georgia Tech Database Group. Licensed under Apache License.