yehchunhung / epfl-dslab

EPFL lab in data science spring 2020

Home Page:https://epfl-dslab2020.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Description

This hands-on course teaches the tools & methods used by data scientists, from researching solutions to scaling up prototypes to Spark clusters. It exposes the students to the entire data science pipeline, from data acquisition to extracting valuable insights applied to real-world problems.

Students work in groups of 4 on big data science problems of the kind typically faced in the industry. There are four graded homeworks and a final project with a video presentation.

Questions

Questions and discussions about the course are gathered on Slack: https://epfl-dslab2020.slack.com. You will receive an invitation to join the workspace.

Final Project

Lab Sessions

Week 1 - 19.02.2020

  • module 1
    • Jupyter Notebooks
    • Python 3.x
    • NumPy, Pandas, Matplotlib, Scikit-Learn
  • slides (pdf):
  • exercises

Week 2 - 26.02.2020

  • module 1
    • Reproducible data science
    • Git, Docker, Renku
  • slides (pdf):
  • exercises (EPFL access required)

Week 3 - 04.03.2020

  • module 2
    • Introduction to big data, best practices and guidelines
    • Loading & querying data with Hadoop
    • HDFS, Hive
  • slides (pdf):
  • exercises

Week 4 - 11.03.2020

Week 5 - 18.03.2020

  • module 2
    • Introduction to distributed computing and the Spark runtime architecture
    • Python on Spark
    • Basic RDD manipulations
  • slides:
  • exercises

Week 6 - 25.03.2020

Week 7 - 01.04.2020

  • module 3
    • Advanced Spark, optimizations and partitioning
  • slides (pdf):
    • lab
    • No industry talk this week
  • exercises
    • week 6 solutions
    • No explicit exercise this week, however you can extend the covid demo project and do some basic data science on an important topic!

Week 8 - 08.04.2020

  • module 3
    • Advanced Spark, optimizations and partitioning
    • Practical exercises with Twitter, SBB data and partitioning
  • slides:
    • Lab is in the form of an exercise notebook
    • industry
  • exercises
  • assessed project

Easter break! - 15.04.2020

Week 9 - 22.04.2020

Week 10 - 29.04.2020

Week 11 - 06.05.2020

Week 12 - 13.05.2020

  • final assignment
    • Useful tips and hints
  • slides (pdf):
  • exercises
  • assessed project
    • homework 3 grades
    • homework 4 due before 00:00 CEST
    • homework 4 solutions
    • final assignment presentation

Week 13 - 20.05.2020

  • final assignment
    • Q&A office hours

Week 14 - 24.05.2020 - 27.05.2020

  • final assignment (25.05 noon)
    • 7 min (max) video and notebook due by midnight
  • final assignment (27.05)
    • Oral Q&A (video calls of 6min per group)
  • assessed project (27.05)
    • homework 4 grades available

About

EPFL lab in data science spring 2020

https://epfl-dslab2020.github.io


Languages

Language:Jupyter Notebook 53.7%Language:HTML 19.4%Language:JavaScript 17.3%Language:CSS 9.3%Language:Dockerfile 0.4%