jgperrin / net.jgp.books.spark.ch99

Spark in Action, 2nd edition - chapter 99

Home Page:http://jgp.net/sia

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The examples in this repository are support to the Spark in Action, 2nd edition book by Jean-Georges Perrin and published by Manning. Find out more about the book on Manning's website.

Spark in Action, 2nd edition - chapter 99

Welcome to Spark in Action, 2nd edition, chapter 99. This chapter is about all the stuff that we'd love to have in the book, but we could not because it is already more than 600 pages.

This code is designed to work with Apache Spark v3.0.0.

Data quality labs

Data quality labs are located in the dq sub package.

Lab #200

This lab mixes machine learning and data quality to predict the revenues of a party of 40 people at a restaurant.

Covid19 labs

Located in the covid19 package.

Data

The data being ingested for those labs is coming from the Center for Systems Science and Engineering (CSSE), part of the Whiting School of Engineering of Johns Hopkins University (JHU). The data is share on GitHub at https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data.

Lab #100 Ingestion

Simple data ingestion,

Other stuff

Located in the misc package.

Lab #9xx

Bunch of stuff in progress, please ignore.

Data

Lots of datasets in this repo, which will be cleaned soon!

Notes:

  1. This repository only contains Java examples.

Follow me on Twitter to get updates about the book and Apache Spark: @jgperrin. Join the book's community on Facebook or in Manning's live site.

About

Spark in Action, 2nd edition - chapter 99

http://jgp.net/sia

License:Apache License 2.0


Languages

Language:Java 100.0%