AudioInsight - Cloudflare AI Challenge Entry

AudioInsight processes audio, transcribes it, summarizes it, generates a title for the content, and allows users to ask questions about the related audio.

This is an entry for the Cloudflare AI Challenge.

Live on: https://audioinsight.gabrielsena.dev/

How It Works
- Under the Hood
- Additional Features
How to Install
Audio Examples
Screenshots

How It Works

On the application's homepage, the user uploads an audio file.
We use the whisper model to transcribe the audio into text.
We use the neural-chat-7b-v3-1-awq model to generate a title based on the provided content.
We summarize the content with the bart-large-cnn model.
After that, the user can ask questions, and we use the neural-chat-7b-v3-1-awq model to answer the user's questions.

Under the Hood

D1 Database is responsible for storing chat and its history.
The Cloudflare R2 is responsible for storing chat's audio files.
Cloudflare Pages is responsible for hosting the entire NextJS application, which provides a front-end and back-end ecosystem.

Additional Features

Preserve conversation: Your chat and audio are stored remotely. You can continue talking about the audio later.

How to Install

Start by cloning this repository:

git clone git@github.com:gabrielsenadev/audioinsight.git

Install dependencies:

npm ci

Create D1 Database:

npx wrangler d1 create db-d1-audioinsight

Configure your database:

npx wrangler d1 execute db-d1-audioinsight --remote --file=./src/database/schema.sql

Create your R2 bucket:

npx wrangler r2 bucket create r2-audios

Update wrangler.toml to target your recently created database and bucket properly:

[[d1_databases]]
binding = "DB"
database_name = "db-d1-audioinsight"
database_id = "d485c019-8021-4d08-88e6-e5a6ea66ad4e"

[[r2_buckets]]
binding = 'R2'
bucket_name = 'r2-audios'

Run preview:

npm run preview

Deploy the application:

npm run deploy

Audio Examples

In the examples/ directory, there are some useful audios to try this application.

Screenshots

Homepage

Chat

About

AudioInsight processes audio, transcribes it, summarizes it, generates a title for the content, and allows users to ask questions about the related audio.

https://audioinsight.gabrielsena.dev/

MIT License

Languages

Language:TypeScript 96.7%Language:JavaScript 2.6%Language:CSS 0.7%