Data Science and Engineering with Apache Spark - edX - BerkeleyX

History

1956-1979: Stanford, MIT, CMU, and other universities develop set/list operations in LISP, Prolog, and other languages for parallel processing (see http://www-formal.stanford.edu/jmc/history/lisp/lisp.html).
Circa 2004: Google: MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat
Circa 2006: Apache Hadoop, originating from the Yahoo!’s Nutch Project Doug Cutting
Circa 2008: Yahoo! web scale search indexing - Hadoop Summit, Hadoop User Group
Circa 2009: Cloud computing with Amazon Web Services Elastic MapReduce (AWS EMR), a Hadoop version modified for Amazon Elastic Cloud Computing (EC2) and Amazon Simple Storage System (S3), including support for Apache Hive and Pig.

You can read more about Spark research here: http://spark.apache.org/research.html

Spark: Cluster Computing with Working Sets, Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. USENIX HotCloud (2010).
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, �Scott Shenker, Ion Stoica. NSDI (2012)
Spark SQL: Relational Data Processing in Spark, Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, Matei Zaharia. SIGMOD 2015. June 2015.