BrooksIan/CensusSIPP

spark zeppelin-notebook sparksql

Data Science in Apache Spark

Census - SIPP Workbook

Report Building

Level: Easy Language: Scala

Requirements:

[HDP 2.6.X]
Spark 2.x

Author: Ian Brooks Follow [LinkedIn - Ian Brooks PhD] (https://www.linkedin.com/in/ianrbrooksphd/)

Source File Description

Upload the source data file [CFS 2012 csv] (https://www.census.gov/econ/cfs/pums.html) to HDFS in the /tmp directory

##Column Descriptions

Pre-Run Instructions

For HDP with Apache Zeppelin

Log into Apache Ambari
In Ambari, select "Files View" and upload all of the CSV files to the /tmp/ directory. For assistance, please use the following tutorial.
Upload the source data file [CFS 2012 csv] (https://www.census.gov/econ/cfs/pums.html) to HDFS in the /tmp directory
Upload helper files to the HDFS in the /tmp directory Upload all of the helper files to HDFS in the /tmp directory

a. SIPP08A.csv

b. SIPP08B.csv

c. SIPP08C.csv

d. SIPP08D.csv

In Zeppelin, download the Zeppelin Note JSON file. For assistance, please use the following tutorial

For Cloudera Data Science Workbench

Log into CDSW and upload the project
Open a terminal on a session and run the loaddata.sh script

License

Unlike all other Apache projects which use Apache license, this project uses an advanced and modern license named The Star And Thank Author License (SATA). Please see the LICENSE file for more information.

About

Reprodicing Census SIPP Reports Using Apache Spark

spark zeppelin-notebook sparksql

The Unlicense

Languages

Language:Shell 100.0%