Project Description:

Miscellaneous ai projects that prototype the use of AI and machine learning to enhance predictive results.

You may find useful code or an approach here but the README may not be well explained and the code likely requires refactoring, sorry but these are prototypes. However, I'll specify if a project works or does not and I will provide a video running a project.

Projects included:

1. vectordb/gpt-embeddings: Vectorize documents in Pinecone vector db for LLM queries:

Works at the time uploaded
Project downloads the Bank of England's Monetary Policy pdf Report November 2023
Chunk the pdf pages to vectorize each page
Vectorizes each page in Pinecone using the GPT Retrieval Plugin with web framework FastAPI(https://blog.devgenius.io/getting-started-with-fast-api-c7e52e68685f). The OpenAI released tool GPT Retrieval Plugin serves as our database interface handling all chunkings, embedding model calls, and vector database interaction.
User asks chat-gpt-3.5 a question about the document and specifies N number of embeddings (pdf pages) relevant to the question and use these to contextualize its response

You can also see the app execution video here.

pinecone-embeddings-to-gpt_.mp4

Special thanks to @Roulin for the clear instructions in the blog fail this link here.

2. classification/ada_and_randomforest: Mail Spam Classification using OpenAI embeddings and a Random Forest Classification model

Vectorize mail dataset with OpenAI's text-embedding-ada-002
Train a random forest classification model with these embedding vectors (features) and labels (mail is spam or ham type)
Test the model and report stats

(oai310env) sergio@Home-Win11:~/my-repos/tooling-ai/classification/ada_and_randomforest$ ./classify_ada_rndforest.py

Start to train the model.
Time elapsed to train the model for 50 mails: 0 minutes, 0 seconds, 48 milliseconds

          precision    recall  f1-score   support

       0       0.75      1.00      0.86         3
       1       1.00      0.86      0.92         7

accuracy                           0.90        10

macro avg 0.88 0.93 0.89 10 weighted avg 0.93 0.90 0.90 10

Special thanks to Kaggle for the dataset and the Geeks for Greeks community for the clear instructions

About

Misc ai projects: pinecone vectorized documents for LLM question context; random forest classification model training with openai embeddings;

MIT License

Languages

Language:Python 77.0%Language:Jupyter Notebook 21.6%Language:Shell 0.6%Language:PLpgSQL 0.6%Language:Dockerfile 0.1%Language:Makefile 0.1%