- 1956-1979: Stanford, MIT, CMU, and other universities develop set/list operations in LISP, Prolog, and other languages for parallel processing (see http://www-formal.stanford.edu/jmc/history/lisp/lisp.html).
- Circa 2004: Google: MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat
- Circa 2006: Apache Hadoop, originating from the Yahoo!’s Nutch Project Doug Cutting
- Circa 2008: Yahoo! web scale search indexing - Hadoop Summit, Hadoop User Group
- Circa 2009: Cloud computing with Amazon Web Services Elastic MapReduce (AWS EMR), a Hadoop version modified for Amazon Elastic Cloud Computing (EC2) and Amazon Simple Storage System (S3), including support for Apache Hive and Pig.
You can read more about Spark research here: http://spark.apache.org/research.html
- Spark: Cluster Computing with Working Sets, Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. USENIX HotCloud (2010).
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, �Scott Shenker, Ion Stoica. NSDI (2012)
- Spark SQL: Relational Data Processing in Spark, Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, Matei Zaharia. SIGMOD 2015. June 2015.
- Piazza discussion group (use access code: cs1051x)
- Apache Spark on Databricks for Data Engineers (Scala)
- pache Spark on Databricks for Data Scientists (Scala)
- http://spark.meetup.com/
- http://spark.apache.org/community.html
- http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html
- http://hortonworks.com/blog/category/spark/
- http://spark-packages.org/
- http://www.spark.tc/blog/
- https://spark.apache.org/docs/latest/
- http://research.google.com/archive/mapreduce.html
- https://forums.databricks.com/
- http://blog.cloudera.com/blog/category/spark/
- http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
- https://github.com/apache/spark/
- http://www.jcmit.com/mem2014.htm
- https://databricks.com/blog/category/engineering
- https://amplab.cs.berkeley.edu/wp-content/uploads/2015/03/SparkSQLSigmod2015.pdf
- http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
- http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf
- https://en.wikipedia.org/wiki/SQL
- http://sqlzoo.net/
- http://www.w3schools.com/sql/
- http://www.sql-tutorial.net/
- https://www.1keydata.com/sql/sql.html
- http://www.sqlcourse.com/intro.html
- http://quickbase.intuit.com/articles/ultimate-web-guide-to-sql-database-language
- http://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive
- https://en.wikipedia.org/wiki/Join_(SQL)
- https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
- http://www.w3schools.com/sql/sql_join.asp