- Course Jupyter book
- Course GitHub page
- Slack Channel
- Canvas
- Gradescope
- YouTube videos
- Class + office hours calendar
© 2021 Varada Kolhatkar, Rodolfo Lourenzutti, and Mike Gelbart
Software licensed under the MIT License, non-software content licensed under Attribution 4.0 International (CC BY 4.0) License. See the license file for more information.
This course is about identifying underlying structure in data. We will talk about clustering, dimensionality reduction, word embeddings, and recommendation systems.
Click to expand!
By the end of the course, students are expected to be able to
- Explain the unsupervised paradigm.
- Explain the intuition behind clustering and use appropriate clustering algorithms for applications such as customer segmentation and document clustering.
- Interpret the results obtained after applying clustering.
- Explain the intuition behind dimensionality reduction.
- Broadly explain and use linear dimensionality reduction techniques such as PCA, LSA, and NMF.
- Explain the intuition of word2vec model to create word embeddings.
- Train your own word embeddings and use pre-trained word embeddings.
- Explain and build recommender systems, specifically using collaborative filtering approaches.
Click to expand!
The following deliverables will determine your course grade:
Assessment | Weight | Where to submit |
---|---|---|
Lab Assignment 1 | 15% | Gradescope |
Lab Assignment 2 | 15% | Gradescope |
Lab Assignment 3 | 15% | Gradescope |
Lab Assignment 4 | 15% | Gradescope |
Quiz 1 | 20% | Canvas |
Quiz 2 | 20% | Canvas |
See Calendar for the due dates.
Click to expand!
Role | Name | Slack Handle |
---|---|---|
Lecture instructor | Varada Kolhatkar | @varada |
Lab instructor | Varada Kolhatkar | @varada |
Teaching assistant | Daniel Ramandi | |
Teaching assistant | David Wakeham | |
Teaching assistant | Dollina Dodani | |
Teaching assistant | Matthew Nguyen | |
Teaching assistant | Mobina Mahdavi | |
Teaching assistant | Ngoc Bui |
This course will be run in person. We will meet three times every week: twice for lectures and once for the lab. You can refer to the Calendar for lecture and lab times and locations. Lectures of this course will be a combination of a few pre-recorded videos, traditional live lecturing, and class activities. The night before each lecture, the material will be made available to you.
This course occurs during Block 5 in the 2021/22 school year.
Here is the list of Kaggle datasets we'll use in this class.
- Credit Card Dataset for Clustering
- Countries of the World
- Airline Sentiment
- Jester 1.7M jokes ratings dataset
- Amazon ratings data
If you want to be extra prepared, you may want to download these datasets in advance and save them under the lectures/data
directory in your local copy of the repository.
The labs are going to be in person. We will also be holding a short 1-hour parallel Zoom session for each lab run by the TAs so that people who cannot join in person have an opportunity to ask questions and get help. You will be able to access appropriate Zoom links via Canvas.
There will be a lot of opportunity for discussion and getting help during lab sessions. (Usually I enjoy labs a lot. It's also an opportunity for me to know you a bit better 🙂.)
We are providing you with a conda
environment file which is available here. You can download this file and create a conda environment for the course and activate it as follows.
conda env create -f env-dsci-563.yml
conda activate 563
In order to use this environment in Jupyter
, you will have to install nb_conda_kernels
in the environment where you have installed Jupyter
(typically the base
environment). You will then be able to select this new environment in Jupyter
. For more details on this, refer to "Making environments work well with JupyterLab section" in your 521 lecture 8.
I've only tried installing this environment file on a couple of machines, and it's possible that you will encounter problems with some of the packages from the yml
file when you run the commands above. This is not unusual. It often means that the package with the given version is not available for your operating system via conda
yet. There are a couple of options for you when this happens:
- Get rid of the line with that package from the
yml
file. - Create the environment without that package.
- Activate the environment and install the package manually either with
conda install
orpip install
in the environment.
Note that this is not a complete list of the packages we'll be using in the course and there might be a few packages you will be installing using conda install
later in the course. But this is a good enough list to get you started.
Click to expand!
We all are here to help you learn and succeed in the course and the program. Here is how we'll be communicating with each other during the course.
If there is any clarification on the lecture material or lab questions, I'll open an issue in the course repository and tag you. It is your responsibility to read the messages whenever you are tagged. (I know that there are too many things for you to keep track of. You do not have to read all the messages but please make sure to carefully read the messages whenever you are tagged.)
If you have questions about the lecture material or lab questions please post them on the course Slack channel rather than direct messaging me or the TAs. Here are the advantages of doing so:
- You'll get a quicker response.
- Your classmates will benefit from the discussion.
I encourage you to use some consistent convention when you ask questions on Slack to facilitate easy search for others or future you. For example, if you want to ask a question on Exercise 3.2 from Lab 1, start your post with the label lab1-ex2.3
. Or if you have a question on lecture 2 material, start your post with the label lecture2
. Once the question is answered/solved, you can add "(solved)" tag before the label (e.g., (solved) lab1-ex2.3
. Do not delete your post even if you figure out the answer on your own. The question and the discussion can still be beneficial to others.
For each deliverable, after I return grades, I'll let you know who has graded what by opening an issue in the course GitHub repository. If you have questions related to grading, please send a direct message to the appropriate TA on Slack and tag them. If you are unable to resolve the issue with the TA, include me in the conversation.
I am open for a conversation with you. If you want to talk about anything sensitive, please direct message me on Slack (and tag me) rather than posting it on the course channel. It might take a while for me to get back to you, but I'll try my best to respond as soon as possible.
Click to expand!
We are working together on this course during a global pandemic. Everyone is struggling to some extent. If you tell me you are having trouble, I am not going to judge you or think less of you. I hope you will extend me the same grace!
Here are some ground rules:
- If you are unable to submit a deliverable on time, please reach out before the deliverable is due.
- If you need extra support, the teaching team is here to work with you. Our goal is to help each of you succeed in the course.
- If you are struggling with the material, the new hybrid teaching format, or anything else, please reach out. I will try to find time and listen to you empathetically.
- If I am unable to help you, I might know someone who can. UBC has some great student support resources.
Masks: This class is going to be in person. Masks are required indoors, including in classrooms, as per the BC Public Health Officer orders. For the purposes of this order, the term "masks" refers to medical and non-medical masks that cover our noses and mouths. Masks are a primary tool to make it harder for Covid-19 to find a new host. You will need to wear a medical or non-medical mask anytime you are indoors at UBC, for your own protection, and the safety and comfort of everyone else in the class. Please do not eat in the classroom. If you need to drink water/coffee/tea/etc, please keep your mask on between sips. Please note that there are some people who cannot wear a mask. These individuals are equally welcome in our class.
Vaccination: If you have not yet had a chance to get vaccinated against Covid-19, vaccines are available to you, free, and on campus [http://www.vch.ca/covid-19/covid-19-vaccine]. The higher the rate of vaccination in our community overall, the lower the chance of spreading this virus. You are an important part of the UBC community. Please arrange to get vaccinated if you have not already done so.
COVID-19 testing: UBC will require COVID-19 testing for all students, faculty and staff, with exemptions provided for those who are vaccinated against COVID-19: [https://news.ubc.ca/2021/08/26/ubc-implements-vaccine-declaration-and-rapid-testing-for-covid-19/]
Your personal health: If you're sick, it's important that you stay home – no matter what you think you may be sick with (e.g., cold, flu, other). A daily self-health assessment is required before attending campus. Every day, before leaving home, complete the self-assessment for Covid symptoms using this tool.
Stay home if you have Covid symptoms, have recently tested positive for Covid, or are required to quarantine. You can check this website to find out if you should self-isolate or self-monitor.
Your precautions will help reduce risk and keep everyone safer. In this class, the marking scheme is intended to provide flexibility so that you can prioritize your health and still be able to succeed:
- All course notes will be provided online.
- All homework assignments can be done and handed in online.
- All exams will be held online.
- Most of the class activity will be video recorded and will be made available to you.
- Before each class, I'll also try to post some videos on YouTube to facilitate hybrid learning.
- There will be at least a few office hours which will be held online.
Click to expand!
- A Course in Machine Learning (CIML) by Hal Daumé III (also relevant for DSCI 572, 573, 575, 563)
- Introduction to Machine Learning with Python: A Guide for Data Scientists by Andreas C. Mueller and Sarah Guido.
- The Elements of Statistical Learning (ESL)
- ML:APP,
- LFD,
- AI:AMA
- An Introduction to Statistical Learning
- There are a bunch of suggestions here. We particularly recommend essence of linear algebra (YouTube series) and Immersive linear algebra (interactive e-book).
- Introduction to Linear Algebra for Applied Machine Learning with Python
- Mike's CPSC 340
- Machine Learning (Andrew Ng's famous Coursera course)
- Foundations of Machine Learning online course from Bloomberg.
- Machine Learning Exercises In Python, Part 1 (translation of Andrew Ng's course to Python, also relevant for DSCI 561, 572, 563)
Please see the general MDS policies.