luelhagos / BDA-with-Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Big Data Analytics with Python

This repository contains content for the Big Data Analytics with Python course. In its latest iteration, the course was taught at The African Institute for Mathematical Sciences (AIMS), Rwanda in 2022 as part of the Master of Science in Mathematical Sciences (Data Science stream) program. For more details about this Masters programme, please check AIMS website.

Course Outline and Goals

This course aims to teach the students/participants the core concepts required to efficiently work with large datasets (aka of Big Data) and to equip the participants with knowledge of the essential tools and techniques for interacting with large scale datasets. The goal of the course is to introduce participants to the use of Python to perfom data science tasks such as data ingestion, data analysis and machine learning when faced with a large dataset. For more details about the course content, refer to this outline, otherwise, the main modules taught in the course are presented below.

Repository Setup

The repository contains the following folders:

  • SLIDES: This folder has all the powerpoint and Google slides with lecture notes. Due to the large size of the presentations, this folder will mostly be empty as I'm not uploading these large files in here. However, the presentations can be found on the link.
  • DOCS: This folder contains miscelleanous documents for the course. For instancee, the course outline.
  • NOTEBOOKS: This folder has all the source code for the tutorials.This includes the notebooks and Python files.
  • DATASETS: As the name suggests, tis folder has the datasets which are used in the course. Again, because of the size, these datasets are not uploaded here.
  • RESOURCES: In this folder, there are learning resources such as PDF books and articles.
  • SOFTWARE: This folder has all the packages required for the course. As some of the installation files are large, they are not available here but they can be found on the Google Drive linked.
  • ASSIGNMENT: This folder contains the course assignments.

How to Use the Materials

In order to follow this material, the recommended approach is to tackle the modules as they are presented in the outline above. For each topic, go through the slides first and then move on to the tutorials in the notebooks. Its worth mentioning that since the course was delivered in person, the material isnt necessarily ideal for self paced learning but a person with reasonable prerequisite knowleedge can still follow the course and grasp the concepts.

Contacts

For any questions regarding this course content, you can contact me through the two email adresses below:

About


Languages

Language:Jupyter Notebook 86.2%Language:Python 13.4%Language:Java 0.4%