bigdata-vandy's repositories
bigdata-vandy.github.io
The blog home of bigdata-vandy
download-stack-dump
Python code to download archived Stack Exchange from https://archive.org/details/stackexchange
spark-corenlp-demo
A demonstration of the Spark CoreNLP library from databricks
spark-wordcount
A brief demonstration of Spark functionality.
spark-xml-parse
Demonstration of XML parsing using the StackOverflow data dump.
data-getters
A collection of simple scripts for pulling data from various and sundry sources.
HBase-Standalone
HBase Standalone Tutorial
akka-demo
A basic demo of web-scraping using Akka (Scala-flavor!)
mapreduce-wc
Wordcount with MapReduce, written in native Java
password-cracker
A demonstration of distributed computation in Spark.
pyspark_intro_vish
This is a brief Introduction to Pyspark
scp-data-to-hdfs
Bash scripts for copying data to the Big Data cluster with SLURM
spark-sem-classify
Classify SEM data using Spark-ML
spark-taxi
Analyze NYC-TLC taxi trip data
tweet-count
Count batch of Tweet records using Java implementation of MapReduce.