minsun-ss / mads

short description of classes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

University of Michigan, Applied Data Science Program, Fall 2019 Cohort

Personal Background

Undergrad: CS major undergrad. Most of my entire career involved little to no programming, but true to everyone in finance generally, quite a lot of excel. My current job (which started the same time as MADS), is actually quite programming heavy. I don't really use excel at all anymore, which is kinda nice.

Python background: I had <1 month of python when I applied to the program (I found out about MADs in the first place while doing the Python 3 coursera...), so ~6 months with little practical experience (excel, remember...) by the time I entered the program.

Statistics background: Basic stats. I ended up focusing more on theory in my CS major to fulfill math requirements over applied courses. Still, probably sufficient for this masters, I guess? I just reviewed Statistics in Plain English before I took the assessment.

Other math courses taken: Single variable calculus, linear algebra, probability/combinatorics. Analysis of algos, etc.

Family/work/lifestyle: Full time job (50 hours/week not including commute), children, gym (1 hour a day). Early on, was also taking language classes which stopped when Covid happened.

General number of courses I take per month: I started at 1/month + language classes. Then once the language classes went goodbye, switched to 2-3 if classes are coding heavy because I find those, timewise, the least intensive. Occasionally I'll drop to 1 if my work/personal life is very busy for that month or I want to exclusively focus on a topic.

2021 Update Post Graduation

If you want to know where I'm at now, I didn't switch positions upon graduation; in some sense I had been migrating already during the course of the program. I believe a fair number of classmates in the same cohort were doing the same (migrating to data science oriented positions). If someone from UMich reads this, btw, I'd really love for the school to roll out an online Masters of Computer Science!

I'm not sure what I do now fits that comfortably in the realm of data science. Perhaps data engineering? Big data quant work? These are pretty fuzzy terms. I still work pretty closely with data, just not really in notebooks anymore that often.

SIADS 501 - Being a Data Scientist

Similar to 503, quite a great deal of writing. Writing for this one might seem a bit easier to write since it's very focused on your goals/needs and that might be easier to assess than ethics. I would say this has more writing than 503 but less introspection. Still, there is still quite a great deal of writing - not only your personal manifesto but writing summaries of articles you read (and you read a lot of articles). I found this a large timesink.

Workload: heavy

SIADS 502 - Math Methods for Data Science

The first cohort had a disaster of a course the first time due to the ridiculous amount of bugs. I have mentally blocked out most of this course but I'm pretty sure the rest of the Fall 2019 cohort remembers the meltdown I had halfway through with this after complaining about all four weeks' HW. Putting actual bugs aside, it is a light course, with relatively light math and equally light use of numpy. Some linear algebra, probability/combinatorics/stats, a bit of gradient descent. I remember exactly one question using multivariate calculus which I've never done but not exactly difficult to figure out if you've already taken single variable. Took me about a week to finish all four assignments, if I remember rightly, but the rest of the month to complain about the fixes. Sorry Yumou, my first memory is arguing with you over the combinatorics problem.

Workload: light

SIADS 503 - Data Science Ethics

Writing, reading AND video heavy course. You will end up watching those videos repeatedly. Weekly written assignments and quizzes. I actually spent hours and hours on this course for the papers and never got a good sense of whether or not I was going in the right direction. Strongly recommend having a partner in this course to help give you feedback on your writing and vice versa. I personally find all writing assignments quite the time sink (this is true even at work, where I wrote research reports for a living) because of format/direction/flow. If you are better at this, this may take up less time.

Workload: heavy

SIADS 505 - Data Manipulation

I actually took most of the UMich data science Coursera as well prior to starting the program and absolutely remember trying furiously to finish the whole damn thing in a week because, well, the whole thing was free during the trial week. Failed largely because data manipulation course (the one that introduces pandas to you) took me a whole three days to finish, and then like, months afterward to even get comfortable. What helped was during my interview process for data analyst positions, I had to do a lot of jupyter notebook trials and they all had 24-48 hour turnaround times so getting extremely fast was a priority. This I think can be a timesink if you are new to pandas and even more of a timesink if you're new to programming. Strongly recommend taking the first or second data science UMich thing in advance if you can since the format is very similar. Otherwise I found it doable.

Workload: medium/heavy (medium until assignment 4)

SIADS 511 - SQL and Databases

If you're familiar with SQL, then very easy. Postgresql variant, so just be warned everything you learn will not necessarily be generalizable to other dbs. Still, a very solid start.

Workload: light

Songs: Rasster's Sad (Imanbek remix). Covid was getting to me.

SIADS 515 - Efficient Data Processing

The first iteration had all 4 assignments hand graded. Relatively light workload if you meet the bare minimum. Otherwise relatively light workload for me personally as some of it was review - e.g., generators, caching, profiling, code efficiency (discussion of big O notation) etc. I found it fun but definitely needed to challenge myself to get
more out of it.

Workload: light/medium

Songs: Mabel and a LOT of Becky G, esp. La Respuesta

SIADS 516 - Big Data: Scalable Data Processing

Some discussion and playing around with mapreduce + spark. Working with spark in real life, I will tell you guys that you have it easy here. The hardest thing about Spark in real life is getting the stupid Spark session to connect, especially if you have no one to help you - I have spent weeks and weeks and weeks of reading more about kerberos than I'd ever care to admit. Otherwise, pretty fun course. I actually really liked the mrjob assignment to really see how mapreduce works. Not a whole lot of spark sql focus so you don't get much insight to how much, say, the sql you'd use in postgres vs the sql you'd use in spark would differ, but maybe that's just left to actual experience.

Workload: light/medium

Songs: Mostly Vicetone at this point. Animal and a very nostalgic remix of Prodigy's Omen.

SIADS 521 - Visual Exploration of Data

A natural extension of 505, but now with matplotlib and one assignment in a visualization library of your choice (I chose Bokeh). I feel sometimes matplotlib is almost as esoteric as pandas. I generally find all visualization based assignments, whether code heavy or otherwise, quite a time sink; you're always looking to fix this color, add that legend, rename this title, format the kerning.... and so on and so forth. The final few assignments give you a lot of autonomy to choose direction and this will cost you a lot of time if you are a perfectionist. Also, I think the extra credit available for this course (it varies by cohort) is an even bigger time sink than the actual assignments.

Workload: medium/heavy

Songs: Mamamoo's Hip and Gogobebe. Long descent in kpop at this point.

SIADS 522 - Information Visualization I

I feel like more of 521, but done in Altair, and more focused on presentation/interaction/interface. A little bit of writing and analysis involved - you read Tufte's work and the like. A very fun and actually useful course, especially if you're in a job/position that requires presenting quite a lot of data to third parties.

Workload: medium

SIADS 523 - Communicating Data Science Results

This is definitely something that is a lot easier if you're already used to presenting stuff at work. A lot of emphasis on style, readability, and being able to cohesively combine information in a format for non-quantitative folks. This is a HUGE time sink because it combines all the costs of building an attractive presentation, doing research, AND presenting (you have to submit a 5 minute recording of your presentation) but without a team to help you out. I found this the biggest timesink of all classes in this program, but it was partly because it was during the height of COVID, but also because I find making Powerpoint presentations very laborious for the same reasons I find making visualizations very laborious: I am always fixing things.

Workload: heavy

SIADS 524 - Presenting Uncertainty

They kept the original videos from March 2020/April 2020 which are not the greatest in resolution or quality (covid, remember); with that said, the videos are actually really engaging. Visualizations done in Altair, so I hope you remember 522, or do this as close as possible to 522 so you can keep Altair fresh in your mind. A course with quizzes + programming assignments + written assignments (the writing portions have more weight than the programming due to the nature of Altair). I found this a difficult course conceptually; I did fine on the quizzes, but never to the point where I was fully comfortable with what I was learning / had learned. A part of me is also a bit resistant to some of these design choices as well, but I think that's intrinsic to the nature of the material.

Workload: medium

SIADS 532 - Data Mining I

Extremely fun. I honestly don't even remember any real issues with this class because there had been no bugs and the lecture slides extremely clearcut. Although, I felt Week 4 (edit distance) seemed out of place in the curriculum? Although it does show up later again. Have to be comfortable with some math, though. Remember finishing this in about a week as well. One note: the quizzes are harder than you'd think and you only get 1 try. Also worth like 15% of your grade. Good luck!

Workload: light/medium

SIADS 542 - Supervised Learning

Relatively similar to the data science Umich coursera. Sklearn can be a bit esoteric to start off with but once you get used to the whole fit/predict paradigm, a lot of sklearn suddenly becomes very easy to parse. Didn't really find this too difficult, per se.

Workload: light/medium

Songs: G-Idle. Can I mention how much I really like Soyeon?

SIADS 543 - Unsupervised Learning

At this point all of the material in this course was new to me, but did not find the workload much significantly heavier than what is in 542. A little less zombie like in that the problems are no longer are "here's some data, shove it in to the classifier and predict away." However, quite a larger emphasis on numpy work so, like, get comfortable with matrix math if you can. I took this with 632 and found the workload very doable, although there is definitely lumpiness on workload week by week. To be fair I was locked in a hotel room for quarantine in Korea at the time, I had nothing BUT time.

Edit: I was a volunteer IA for the second iteration of this course; the new additional slides at the end of the course are very nice.

Workload: light/medium

SIADS 591/592 - Milestone I

Largely dependent on your project and partner, so hence probably somewhat valuable to have some insight in these two before you start the milestone. Project workload highly dependent on how ambitious you and your partner are and how annoying it is to work with your datasets.

Re: oral exam, while I'm (probably) not shy, I'm pretty horrid at thinking on my feet though and mildly socially anxious so found the oral exam harder than the project - I generally don't really read from notes even in real life so I find this format harder to deal with, in some ways (all 4 of my jupyter notebooks to prep for the 4 topics are like... 2-3 paragraphs long at best?).

Workload: medium

SIDAS 601 - Qualitative Inquiry for Data Scientists

You have been forewarned, this course requires 3 interviews on a single project so you may need to scramble to find a project very quickly to schedule them. The course focuses on answering a qualitative question on a quantitative data set, interviewing at least three stakeholders about it, building an affinity wall and writing a data report. There are a handful of discussion questions to answer in the four weeks, mostly related to reflecting on what you've done so far; I would say of the writing heavy courses (e.g., 501, 503, etc), this is in a sense the least intensive of the three because the bulk of the difficulty is in the scheduling and having a workable course project.

Having taken this course nearly a full year after taking my last qualitative course, I actually welcomed the break from coding to go back to writing since I needed the practice. I personally think doing data science - or research in any place and form - requires more socializing than one would expect and is somewhat integral to the job. Data doesn't exist in a vacuum after all.

This is also very minor, but the grading in this course is lightning fast for hand grading.

Workload: light/medium (writing), medium/heavy (scheduling)

SIADS 611 - Database Architecture and Technology

An extension of 511. Still in Postgres. You get to play with a lot of the cool postgres features that not all dbs have.

Workload: light

Songs: Camila Cabello mostly. Shameless and Liar.

SIADS 622 - Information Visualization 2

An extension of 522. This is quite the late review, but you definitely don't want to take this course while you're doing the capstone. Between work and the capstone, I literally have 0 memories of the course other than doing the absolute bare minimum to get this going and it's a pity this was rolled out so late becuase honestly I would've benefited a lot more from this outside of the milestone/capstones. It's a lot of altair and like the previous visualization class, you'll get penalized for not replicating the charts perfectly. Also I'm pretty sure I didn't deserve the grade I got for this course due the final curve that applied to my particular iteration of the class.

Workload: light/medium

SIADS 630 - Casual Inference

Programming assignments on par with 631 I think. Maybe a little easier because you'll almost exclusively use one library. Most of the difficulties lie in answering the qualitative questions. I'm not particularly good at statistics, but I suspect more of it is not using it as regularly as to commit it to memory, and this course is particularly useful in developing a framework around how to approach answering questions. Would very much like to see further development and/or extensions of this course, to be honest, as it's more high level than I would like.

Workload: light

SIADS 631 - Experiment Design and Analysis

Programming assignments on par with 522 - there is a little bit of time required to master them but not so significant that it's a time killer. Moblabs interesting theoretically but I'm on the fence on how they're graded - in the current iteration you get either 50% or 100% since out of the 2 points for each moblab you get 1 for participating and then the remainder is based on your score relative to the rest of the class - which is rounded to the nearest integer. If we assume a normal distribution of scores then it should be roughly 50/50 probability. A fair amount of reading (actually quite a lot of reading although a significant amount not tested) and quizzes by far more annoying than the programming assignments. Really nice videos, though, I think I could listen to Yan Chen talk forever.

Workload: medium

SIADS 632 - Data Mining II

Extremely fun course. Lots of time series data work! Workload significantly heavier compared to other SIADS courses so far, but not so bad that I would make this a single course unless you are still uncomfortable with python programming or math. Would note that there's a significant amount of non-Pandas programming for this one, so would keep that in mind as you assess your own experience. I personally found this course quite a great deal of fun.

Workload: medium/heavy

SIADS 642 - Deep Learning

Almost no programming required. Four written assignments about deep learning. Your mileage may vary on whether or not you find this beneficial or not; I find that most deep learning code is relatively lightweight so don't really see much of an issue on it being more about the structure. But I'm also more comfortable with coding on my own. I spent a lot of time reading a lot of articles and thinking about stuff. You'll find a lot of contradictory statements about deep learning on the internet, btw, so I do think it is worthwhile to reflect a bit.

Workload: light/medium

SIADS 643 - Machine Learning Pipelines

The "about time I should learn to use Git" course. If you already have a regular dev style background this course should take you, like, less than an hour or two to get through. I'm not even a dev and I finished this in less than 16 hours (this includes the time I slept and went to work). Videos short and to the point, which I actually find very nice, actually. I actually learned quite a bit in this course, but it is also not a long course. I think it could use a bit more material to take on.

Workload: light

SIADS 652 - Network Analysis

Timely and convenient! Nothing like a global pandemic to let the epidemic networkx models sink in. This is a combo quiz/written answer/programming course that leans heavier on programming assignments than other similarly structured courses, but because of the quiz/writing it takes a bit longer to get through than a pure programming course (from my point of view) but less time than a full written course. I've used basic networkX before but this covers details about a lot of the various models implemented within networkX, which is nice because networkX's documentation itself is a bit on the sparse side. Programming assignments are one very long jupyter notebook, so be warned: like NLP, small bugs can potentially be a huge timesink (there was one for the random seed, which is nigh impossible to debug on your own) since there are not enough asserts in the homework. Also, many quiz questions can be answered using networkX and I strongly suggest you do so.

Workload: medium

SIADS 655 - Natural Language Processing

NLP! You might have gotten a bit of a taste for it after doing n-gram stuff in Data Mining 2 or the little bit you get in Unsupervised Learning. Well, now you get more! You get to play with the spacy library, which is pretty cool. The only downside to this class however is that the homeworks are a big fat NOPE as they stand. I would say charitably that the homeworks are, in fact, a test of human NLP ability rather than you discovering the joys of ML NLP. On the other hand, IF you manage to get through them, they are often quite interesting. You just may be defeated by the debugging if you have the misfortune of making a mistake somewhere as you go through them. I really like the NLP material though, lots of good stuff.

Workload: light (medium/heavy on debugging if you make a mistake somewhere)

SIADS 680 - Learning Analytics

Hm. Python wise, this might be a little more intensive than Social Media Analytics and definitely a lot less than Search and Recommender Systems. The final week, however, has a very heavy Dash dashboard component to it and you've never worked with Dash or Flask before, it can be a bit of an uphill climb on short notice. Weekly readings and case studies with written reflections (quite a heavy chunk of this), and programming assignments every week. I would say that the subject itself is not directly interesting to me, so was a bit unenthused about taking this elective. Assignments were fairly interesting, though.

SIADS 682 - Social Media Analytics

This is a pretty good application course; outside of week 1, the rest of the weeks touches on things you've learned previously, but applied to social media data. So the course itself really has few, if any, lectures; you'll do more paper readings/reflections and coding. But if you don't really remember anything from 522 (Altair), 652 (using networkX), 632 (using statsmodels), or even 655 (gensim), you should uhhh try to remember again. There's a lot of autonomy built into the course (e.g., I feel the HWs and suchlike sort of implicitly assume you know what you are doing and that you should have few, if any, difficulties; the autograder is generally there to move you in the right direction), which I suppose will be a hit or miss depending on the audience. Probably a miss for those who prefer a lot more hand holding. To be fair, the coding hws could definitely have used another pass to get rid of the copy paste errors. If there was only 1 beef to be had, it would be making me try to remember how do viz with not one, but three libraries (mpl, altair, and seaborn). I guess sns is more like mpl 1.5 so 2.5 libraries.

Workload: light

SIADS 685 - Search and Recommender Systems

Watch out! If you've found 632 time intensive, I would seriously consider how you plan this course around other courses. 685 is approximately 1.3x more time and work than 632. You will be expected to write and test much larger code pieces, so if you've found 632 manageable without relying on the autograder to test your answers for you then you will find 685 manageable. Basically take all of week 1 in 632 and pretend it was a single assignment to submit at once (while checking your work all at once), and you'll have some idea of where the pain points will be in 685. Really excellent course, though, slides are very clear and it's such an interesting topic. Like 632, a bit lumpy on the HW: week 1 and 2 are the heaviest, and then week 3 and 4 relatively low effort. Would not pair this course that also have heavy first and second week assignments though, unless you have a lot of free time.

Workload: heavy (first 2 weeks), light (last 2 weeks)

SIADS 694/695 - Milestone 2

Where do I begin. I'm not sure where to begin on this one. So let's start with the basics: another oral exam. 6 questions, 2 of which will be the topic of the oral exam (something from 1-3 and something from 4-6; in my cohort I don't think I met anyone who got question 3 or 4, though). May or may not be an oral exam with professors listening to you; they unrolled an async exam when I took it. ML project of some sort in groups of 1-2: must experiment with at least 3 supervised learning approaches and 2 unsupervised approaches. Similar to milestone 1 in that you submit a final report, 10-12 pages.

Things to be aware of, and maybe this will change in later iterations, but: there is a week 6 and 7 ungraded checkin that no one reminds you about (not even in Slack) and you'll only see if you click into those weeks in Coursera; zero rubric made available for the oral exam grading, so you'll be in the dark there; extremely late grading so you won't know your project proposal/final proposal until close to the end of the course; a project outline/grading schema that provides zero rubrics for projects that don't rely on the data sets provided by the course; somewhat bad/inconsistent office hours (you're honestly better off DMing the staff). So with that out of the way: get your project and partners together early and try to get your questions out of the way as early as you can because you will be put in a very tight spot otherwise.

No, I did not have a meltdown in this course, but I felt quite like I did when I took 502.

SIADS 697/698 - Capstone

Similarish to the milestones, except now you have complete freedom to do what you want. Team of 3 (or 4 - we did a team of 4). Difficulty largely based on how ambitious your project, you know the drill for milestones. Much better staff support this time around, but to be fair the final 2 months I was literally mentally checked out of the program - I was burning out from both ends from coding at work and finishing the deadlines for things.

Would suggest taking a milestone out of your combined partner's projects to scale up significantly, so you're not starting completely at zero, but this is largely at your discretion.

About

short description of classes