dmatekenya / UNSIAP-Python-Oct-2019

Contains course material for Python training given at UNSIAP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Big Data Analytics with Python

The material in this repository was presented at a training workshop at UN, Statistical Institute for Asia and Pacific(SIAP) in Chiba, Japan on October 16-17, 2019. These sessions were presented as part of the Theory and Practices in Official Statistics for Monitoring SDGs training organized by SIAP.

Course Outline and Goals

The goal of the course is to introduce participants to the use of Python to perfom data science tasks such as data ingestion, data analysis and machine learning with focus on processing of large scale datasets. This course is different from regular online courses as it uses real life datasets and case studies to challenge participants with real world data science problems, instead of solving toy problems. Since this is a 2 day (8 hour course), the idea of the course is to introduce participants to the concepts rather than provide a detailed coverage. The following topics will be covered:

  • Day 1 [Python for Data Science]: During the first day, participants will be given a crash course on Python programming. The rest of the day will focus on generating data using Python by accessing APIs and scraping web pages.
  • Day 2 [Machine Learning and Big Data in Python]: On the second day, we will go through how to tackle Machine Learning(ML) probelems using Python. Participants will also be shown a demonstration of processing a large scale dataset using Python.

Delivery Style

Considering that we have only 8 hours to cover the material, this course is intended more as an information to introduce the participants on state of the art of tools in Data Science using Python. In this regard, the course will utilize different approaches as follows to deliver the material:

  • Lecture: power point slides will be used to provide introduction to key concept
  • Follow along coding: the participants will be provided with a pre-prepared Jupyter Notebook which they can follow along with the course instructor.
  • Coding exercise: Short programming exercises will be given to participants to enable them practice key concepts
  • Demonstrations: Due to time limitations, in some cases, the instructor will show the participants demonstrations so that they appreciate how some concepts are implemented in practice.

Repository Setup

The main materials contained in this repository are source code(src), powerpoint slides and data. All the source code live in the src folder. The rest of the folders are organized by topic (e.g., e.g., machine learning). In these folders, we have data as well other useful resources. Note that in cases where the data files are huge, the data isn't available in the folder in the repository due to Github data storage limitations. Most of the powerpoint slides are large, these are not included in the repository, instead you can find uptodate powerpoint slides here. All the code use Python 3.

Pre-course Training Materials

In the Big Data Analytics with Python course, we will use the Python programming language to interact with data. To ensure that participants gain the most out of the course, we require that you have basic skills in Python. Luckily, the internet is full of very good introductory Python courses. Please see below for two of such course which you can go through. In addition to Python, a basic understanding of Github is also required for this course. See Github pre-course preparation for tutorials.

Introduction to Python

See below two links for free Python courses. You need only do one of the courses, but you can do both if you will. They are both free and will take less than 5 hours of your time. Once you finish the course(s), you will have the prerequisite Python knowledge to enable you gain the most out of the 5-day course.

  1. Free Udemy Python Course

  2. Another Free Udemy Python Course

Github

We will use Github for tracking our code and submitting exercises. As such, its important that you make yourself familiar with Github. Refer to the links below for Github training materials.

  1. Github tutorial on Youtube
  2. Github tutorial

About

Contains course material for Python training given at UNSIAP


Languages

Language:Jupyter Notebook 95.4%Language:Python 4.6%