johnfkraus / coffee-quality-database

Building the Coffee Quality Institute Database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

coffee-quality-database

Digitizing 1,340 coffee reviews

Data

These data contain reviews of 1312 arabica and 28 robusta coffee beans from the Coffee Quality Institute's trained reviewers. The features include:

Quality Measures

  • Aroma
  • Flavor
  • Aftertaste
  • Acidity
  • Body
  • Balance
  • Uniformity
  • Cup Cleanliness
  • Sweetness
  • Moisture
  • Defects

Bean Metadata

  • Processing Method
  • Color
  • Species (arabica / robusta)

Farm Metadata

  • Owner
  • Country of Origin
  • Farm Name
  • Lot Number
  • Mill
  • Company
  • Altitude
  • Region

The data folder contains both raw and cleaned data. The raw data is exactly as it was found on the CQI site. Since these human-recorded data use a variety of different encodings, abbreviations, and units of measurement for their farm names, altitude, region, and other fields, I recommend using the cleaned data as a starting point.

The site was scraped using a Selenium headless browser and Beautiful Soup. To replicate this or collect updated data, create a login for the CQI site and enter your credentials in the scraper

Source

These data were collected from the Coffee Quality Institute's review pages in January 2018.

About

Building the Coffee Quality Institute Database

License:MIT License


Languages

Language:R 90.9%Language:Python 9.1%