sergiosolorzano / tooling-ai

Misc ai projects: pinecone vectorized documents for LLM question context; random forest classification model training with openai embeddings;

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Description:

Miscellaneous ai projects that prototype the use of AI and machine learning to enhance predictive results.

You may find useful code or an approach here but the README may not be well explained and the code likely requires refactoring, sorry but these are prototypes. However, I'll specify if a project works or does not and I will provide a video running a project.

Projects included:

1. vectordb/gpt-embeddings: Vectorize documents in Pinecone vector db for LLM queries:

You can also see the app execution video here.

pinecone-embeddings-to-gpt_.mp4

Special thanks to @Roulin for the clear instructions in the blog fail this link here.

2. classification/ada_and_randomforest: Mail Spam Classification using OpenAI embeddings and a Random Forest Classification model

  • Vectorize mail dataset with OpenAI's text-embedding-ada-002

  • Train a random forest classification model with these embedding vectors (features) and labels (mail is spam or ham type)

  • Test the model and report stats

    (oai310env) sergio@Home-Win11:~/my-repos/tooling-ai/classification/ada_and_randomforest$ ./classify_ada_rndforest.py

image
Start to train the model.
Time elapsed to train the model for 50 mails: 0 minutes, 0 seconds, 48 milliseconds

          precision    recall  f1-score   support

       0       0.75      1.00      0.86         3
       1       1.00      0.86      0.92         7

accuracy                           0.90        10

macro avg 0.88 0.93 0.89 10 weighted avg 0.93 0.90 0.90 10

Special thanks to Kaggle for the dataset and the Geeks for Greeks community for the clear instructions

About

Misc ai projects: pinecone vectorized documents for LLM question context; random forest classification model training with openai embeddings;

License:MIT License


Languages

Language:Python 77.0%Language:Jupyter Notebook 21.6%Language:Shell 0.6%Language:PLpgSQL 0.6%Language:Dockerfile 0.1%Language:Makefile 0.1%