kingb12 / nlp220_hw1_data

Data for Assignment 1 in NLP 220: Data Science & ML Fundamentals

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NLP 220: Assignment 1 Data

This repository holds the data for assignment 1 in NLP 220, as well as code for reproducing it (not needed for assignment completion).

Getting Data for Assignment 1

On Mac/Linux (e.g. nlp-gpu-01):

wget "https://raw.githubusercontent.com/kingb12/nlp220_hw1_data/main/small_books_rating.csv"

On Windows:

Unable to test myself, but Wget for Windows looks useful and would result in the same command as above once installed. Some more options discussed here.

(Optional) Full Dataset Download Instructions

To reproduce, do the following, and then run python create_dataset.py.

  1. Sign up for a Kaggle account
  2. Set up an API token in your profile
  3. Move the API token (provided in kaggle.json) to your working computer (could be nlp-gpu-01) under ~/.kaggle.
  4. In your assignment environment: pip install kaggle
  5. kaggle datasets download -d mohamedbakhet/amazon-books-reviews

About

Data for Assignment 1 in NLP 220: Data Science & ML Fundamentals


Languages

Language:Python 92.7%Language:Shell 7.3%