s2t2 / openai-embeddings-2023

Classifying users on social media, using text embeddings from OpenAI and others

Home Page:https://s2t2.github.io/openai-embeddings-2023/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

openai-embeddings-2023

Dimensionality Reduction on Twitter Data using OpenAI Embeddings

Research Questions

Can we use ChatGPT's embeddings to reproduce our previous research?

Can ChatGPT discern bot status, political sentiment, and q-anon support, based on user profiles and tweets?

Setup

Create and/or activate virtual environment:

conda create -n openai-env python=3.10
conda activate openai-env

Install package dependencies:

pip install -r requirements.txt

Obtain a copy of the "botometer_sample_openai_tweet_embeddings_20230724.csv.gz" CSV file, and store it in the "data/text-embedding-ada-002" directory in this repo. This file was generated by the notebooks, and is ignored from version control because it contains user identifiers.

Usage

Dataset Loading

Demonstrate ability to load the dataset:

python -m app.dataset

Data Analysis

Perform machine learning and other analyses on the data:

Testing

pytest --disable-warnings

About

Classifying users on social media, using text embeddings from OpenAI and others

https://s2t2.github.io/openai-embeddings-2023/


Languages

Language:HTML 99.7%Language:Jupyter Notebook 0.2%Language:Python 0.0%