Arewa Datascience presents this 10-week online course, as part of the Arewa Datascience Fellowship. In this curriculum, you will learn about classical machine learning, using pandas and Scikit-learn libraries. The curriculum does not include deep-learning, as it is anticipated to be covered in the future (second cohort).
Our fellows have completed the Python-30 Day challenge that we prepared. If you are following this curriculum, we expect you to have a basic understanding of Python.
Fellows, to use this curriculum, fork the entire repo to your own GitHub account and complete the exercises on your own or with a group:
- Start with a pre-lecture quiz.
- Read the lecture and complete the activities, pausing and reflecting at each knowledge check.
- Try to create the projects by comprehending the lessons rather than running the solution code; however that code is available in the
/solution
folders in each project-oriented lesson. - Take the post-lecture quiz.
- Complete the challenge.
- Complete the assignment.
- After completing a lesson group, visit the Telegram Group and "learn out loud" by writing about the new knowledge you gained. Don't forget to write a medium blog post about what you learn each week.
We have chosen two pedagogical tenets while building this curriculum: ensuring that it is hands-on project-based and that it includes frequent quizzes. In addition, this curriculum has a common theme to give it cohesion.
By ensuring that the content aligns with projects, the process is made more engaging for students and retention of concepts will be augmented. In addition, a low-stakes quiz before a class sets the intention of the student towards learning a topic, while a second quiz after class ensures further retention. This curriculum was designed to be flexible and fun and should be taken in whole. The projects start small and become increasingly complex by the end of the 10-week cycle. This curriculum also includes a postscript on real-world applications of ML, which can be used as extra credit or as a basis for discussion.
- optional sketchnote
- optional supplemental video
- pre-lecture warmup quiz
- written lesson
- for project-based lessons, step-by-step guides on how to build the project
- knowledge checks
- a challenge
- supplemental reading
- assignment
- post-lecture quiz
A note about languages: These lessons are primarily written in Python, but many are also available in R. To complete an R lesson, go to the
/solution
folder and look for R lessons. They include an .rmd extension that represents an R Markdown file which can be simply defined as an embedding ofcode chunks
(of R or other languages) and aYAML header
(that guides how to format outputs such as PDF) in aMarkdown document
. As such, it serves as an exemplary authoring framework for data science since it allows you to combine your code, its output, and your thoughts by allowing you to write them down in Markdown. Moreover, R Markdown documents can be rendered to output formats such as PDF, HTML, or Word.
Week | Topic | Lesson Grouping | Learning Objectives | Linked Lesson | Mentors |
---|---|---|---|---|---|
01 | Introduction to machine learning | Introduction | Learn the basic concepts behind machine learning | ||
02 | Working with Data | Working With Data | Introduction to pandas and data preparation | ||
03 | Data Visualization | Data Visualization | Introduction to Matplotlib, Data Distributions, Proportions and Relationships and Meaningful Visualizations - bird data 🦆 | ||
04 | Regression | Regression | Tools, Data Visualization and Regression Models - North American pumpkin prices 🎃 | ||
05 | Classification | Classification | Data preprocessing, classifiers - Delicious Asian and Indian cuisines 🍜 | ||
06 | Clustering | Clustering | Data preprocessing, clustering - Exploring Nigerian Musical Tastes 🎧 | ||
07 | Natural language processing ☕️ | Natural language processing | Learn the basics about NLP by building a simple bot |
|
|
08 | Sentiment Analysis | Natural language processing | Sentiment analysis with hotel reviews |
|
|
09 | Time Series Forecasting | Time series | Introduction, ARIMA, Support Vector Regressor (SVR) - ⚡️ World Power Usage ⚡️ | ||
10 | Reinforcement Learning | Reinforcement learning | Introduction, Reinforcement learning with Q-Learning and Gym | ||
Projects | Real-World ML scenarios and applications | ML in the Wild | Interesting and revealing real-world applications of classical ML | Lesson |
|
You can run this documentation offline by using Docsify. Fork this repo, install Docsify on your local machine, and then in the root folder of this repo, type docsify serve
. The website will be served on port 3000 on your localhost: localhost:3000
.
The curriculum for this course were adapted from Microsoft's 'ML-For-Beginners' and 'DS-For Beginners' curricula.