kalealex / data227-sp23

This repository serves as the primary course website for DATA 22700 at UChicago in Spring quarter 2023. Students should refer to this page for the course syllabus, schedule, and assigned coursework.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DATA 22700

This course, the data visualization offering for students in the Data Science minor, helps students build core competancies for communicating with data, including generating visualizations programmatically, writing about design and analysis choices, and learning principles and procedures for rigorous work in data science. DATA 22700 introduces students to the basics of visualization design, including theoretical frameworks for reasoning about chart construction, perceptual principles, and considerations for use of color, mapping, making data interactive, and conveying uncertainty. DATA 22700 also requires students to engage various other skill sets important in data science such as technical reading and writing, data wrangling, statistical modeling, storytelling, and producing shareable, reproducible analysis notebooks.

Completion of or placement out of DATA 11800, CMSC 12100, CMSC 15100, or CMSC 16100 is a prerequisite for taking this course.

Students are expected to enter the course with a basic knowledge of programming in Python (including working with dataframes) and mathematical foundations of statistical modeling. Students who are uncomfortable with these topics may find DATA 22700 especially challenging but are nonetheless encouraged to apply themselves and grow. Both creating graphics and working with data involve many difficulties that must be confronted with patience and perserverence, but both skills will benefit students tremendously in a wide variety of endeavors.

Course objectives

Upon completion of the course, students should be able to:

  1. Generate visualizations programmatically
  2. Apply principles of perception and statistics to visualization design
  3. Avoid creating ill-formed or ineffective visualizations
  4. Recognize ill-formed, ineffective, misleading, or deceptive graphics and suggest ways to redesign them
  5. Use computational notebooks to write cogent, reproducible analyses

Communication

The primary method of communication between students and instructional staff will be email.

Instructor: Alex Kale - kalea@uchicago.edu

TA: Andrew McNutt - mcnutt@uchicago.edu

TA: Pratham Gandhi - pratham@uchicago.edu

We will not be relying on EdTech systems like EdStem. We will use Canvas only to link to course content like this GitHub repo.

Class sessions

We will hold class Tuesday and Thursday 12:30 PM - 01:50 PM in Ryerson Physics Lab 276.

Office hours

Students seeking help with concepts or coursework should plan to attend office hours. We will hold the following office hours each week unless posted otherwise.

Day of week Time of day Person hosting
Monday 10:00 AM - 11:30 AM Alex
Monday 10:30 AM - 11:30 AM Pratham
Monday 3:00 PM - 4:30 PM Andrew
Wednesday 10:30 AM - 11:30 AM Pratham
Wednesday 3:00 PM - 4:00 PM Andrew
Wednesday 10:30 AM - 11:30 AM Pratham
Thursday 10:00 AM - 11:30 AM Alex
Friday 10:30 AM - 11:30 AM Pratham

Alex will hold office hours in John Crerar Library room 263.

Andrew will hold office hours via Zoom.

Pratham will hold office hours via Zoom.

Due to conference travel, Alex be unavailable April 23 - April 29, and Andrew will be unavailable April 20 - May 7, and May 10. We will schedule extra office hours around these times to make sure students have access to help.

Materials, turn-in, and gradebook

All course materials can be found in the course GitHub repo (the page you are currently looking at).

Student work will be turned in via Gradescope, where students will be able to view grades and feedback on coursework.

Schedule

Week 1: Introduction

March 21 - Value of data visualization

March 23 - Grammar of graphics (Exercise 1)

Week 2: Working with data and computational notebooks

March 28 - Data and how to encode it

March 30 - Computational notebooks (Exercise 2)

Students will need to set up VSCode or Google Colab. Colab is recommended for students who are unfamiliar with their file system or who don't want to deal with Python installation.

Week 3: Principles for visualization design

April 3 - Assignment 1 due!

April 4 - Design process and critique (Exercise 3)

April 6 - Perception

Week 4: Color and cartography

April 11 - Color

April 13 - Maps (Exercise 4)

Week 5: Data interaction

April 18 - Interaction (Exercise 5)

April 20 - Animation

April 21 - Assignment 2 due!

Week 6: Wildcard (Alex and Andrew away at CHI)

April 25 - Class canceled

April 27 - Project check-in with Pratham (Exercise 6)

Week 7: Rhetorical visualization

May 2 - Deception (Exercise 7)

May 4 - Storytelling

Week 8: Uncertainty

May 9 - Uncertainty visualization

May 11 - Statistics review (Exercise 8)

May 12 - Assignment 3 due!

Week 9: Visualization for model interpretability

May 16 - Visualizations as model checks

May 18 - Vis for ML (Exercise 9)

May 19 - Project due!

Coursework

Deliverables for this course include 3 assignments, 9 exercises, and 1 project.

Assignments

Assignments are summative assessments of core learning objectives in the course. Each assignment involves analyzing and visualizing a specific dataset provided by the intructor. Students are expected to create orginal work; collaboration or copying from any source is not allowed. Assignments are evaluated based on both the quality of visualizations produced as well as the quality of write-ups explaining the analysis and design rationale.

Exercises

Exercises are opportunities for skill-building, practice, and collaboration. Exercises begin during class time, but students may need to spend time completing them at home. Exercises involve a few specific tasks, including reading technical specifications, writing code, and documenting work. Exercises are evaluated only on completeness.

Project

The project serves the purpose of a final in DATA 22700. The project involves choosing and analyzing dataset and producing a written report about the analysis. Students are expected to create orginal work; collaboration or copying from any source is not allowed. The project is evaluated on the following criteria:

  • Choice of dataset
  • Quality of analysis
  • Quality of visualizations
  • Quality of write-up

Evaluation

We use a form of grading known as specifications grading in this course. The goal of specifications grading is to help students focus on their mastery of the material and identify areas for improvement as the quarter progresses. Shorthand: focus on skills, not on scores.

Final grades will be determined based on assignment, exercises, and the project.

Assignments and the project

Assignments and the project will be evaluated using an S/N/U scale:

  • Satisfactory (S): The student demonstrates sufficient mastery of the material.
  • Needs Improvement (N): The student has put in a good-faith effort to complete the work, but revealed a lack of mastery in the material that can be addressed via concrete feedback.
  • Ungradable (U): The student did not submit any work, or did not complete a sufficient portion of the work (e.g., completed less than half the work that was assigned).

When reviewing assignments, we evaluate both:

  • Quality of visualizations including but not limited to design choices such as encodings, transformations, and scales.
  • Quality of write-ups including but not limited to clear communication, logical arguments, and cogent rationales for design and analysis choices.

When reviewing the project, we evaluate for:

  • Choice of dataset including but not limited to whether the dataset can answer the questions the student poses, support a narrative, and enable visualizations that demonstrate the skills learned throughout the course.
  • Quality of analysis including but not limited to whether analysis choices are statistically valid, whether the analysis is robust to arbitrary analysis choices, and whether analysis choices are documented, justified, and reproducible.
  • Quality of visualizations including but not limited to design choices such as encodings, transformations, and scales.
  • Quality of write-up including but not limited to the rhetorical cohesion of the write-up with the analysis, clear communication, logical arguments, and cogent rationales for design and analysis choices.

The specification for each assignment and the project includes a more precise description of what is required to earn an S, N, or U each each category.

There are a total of 6 S/N/U scores for assignments. Every assignment has an S/N/U score assigned for quality of visualization and quality of write-up.

There are a total of 4 S/N/U scores for the project, assigned for choice of dataset, quality of analysis, quality of visualizations, and quality of write-up.

Exercises

Exercises provide a participation grade. Exercises will be graded only on completion and will receive a score of either Satisfactory (S) or Ungradable (U). Students will not have the option of earning a Needs improvement (N) on exercises.

There are a total of 9 S/U scores for exercises, one per exercise.

Final grades

In total, students receive 19 S/N/U scores, with only 10 N scores possible.

Final grades are based on the following table. The number of Satisfactory scores determines a student's letter grade. The number of Needs improvement scores, plus any additional Satisfactory scores beyond those needed for a given letter grade, determine plus and minus within each letter grade.

Minimum S Required Minimum N (or additional S) Required Final Grade
17 2 A
17 0 A-
15 4 B+
15 2 B
15 0 B-
13 4 C+
13 2 C
13 0 C-
11 2 D+
11 0 D
10 and below NA F

Consider some examples:

  • A student with 18 S, 1 N, and 0 U would get an A
  • A student with 17 S, 1 N, and 1 U would get an A-
  • A student with 16 S, 3 N, and 0 U would get a B+
  • A student with 15 S, 2 N, and 2 U would get a B
  • A student with 13 S, 1 N, and 5 U would get a C-
  • A student with 12 S, 5 N, and 2 U would get a D+
  • A student with 10 S, 8 N, and 1 U would get an F

Late policy

Late submissions are not accepted in this class, except under the specific circumstances (see below). Assignments and the project will have specific due dates, which students must adhere to. Exercises can be turned in at any time before the project due date. Please bear in mind that the grading scheme is set up to absorb a reasonable amount of sub-par work. Turning in something unpolished is much better than not turning in anything at all.

Exceptions to the no late work policy:

  • Late chips: Students get one late chip they may use during the quarter to turn in an assignment up to 48 hours after the deadline posted on Gradescope. To use a late chip, email the instructor your work within 48 hours of the posted deadline, and say that you are using your late chip. Students may only use one late chip in the quarter. Late chips may not be used more than 48 hours after the deadline. Late chips may only be used for assignments, not for exercises or the project.
  • Emergencies: If you have an emergency and feel it warrants an exception to the no late work policy, you should first be in contact with your College advisor, as the College should be aware of the emergency and ensure that any proper university or department policies are followed if needed (for example, an injury might require SDS accommodations). Once you have contacted the College, please contact us by email with a CC to your College advisor. Contacting us as early as is practical given the emergency makes the process of accommodating your situation work more smoothly for everyone. We care about your well-being and success in the class, and have put these policies in place to be fair and give students agency.

No other exceptions will be made.

Grade disputes

Except in very specific cases (described below), you cannot dispute the score assigned to you on a piece of work. The score you receive on a piece of work is meant to convey feedback on your level of mastery, and you should take it as an opportunity to understand the areas for improvement in your work. You are welcome to ask us for concrete advice about how to improve your work; we are always happy to have those kind of conversations with students, including going over your code or writing. On the other hand, we will not entertain requests to change your score just because you feel your work deserved a higher score.

There is one exception to this: if a grader made a factual mistake in your grading. Please note that this only includes cases where a grader makes an erroneous statement about your code or writing in their feedback. It does not include cases where you simply disagree with whether something deserves to be flagged as incorrect.

For example, suppose you receive a piece of feedback that says "Poor choice of encoding for data type: Student used a part-to-whole representation for non-proportion data”. If the encoded data in question was actually proportion data, and the grader missed this fact (and erroneously gave you that feedback), you can ask us to review this decision. Please note that, even if the feedback is amended, it may not affect your actual SNU score depending on how many other issues were identified in your work.

We ask that you keep these requests brief and to the point: no more than a few sentences identifying the exact statement that the grader made and the reasons you believe the statement was mistaken, including references to specific parts of your code or writing (e.g., “I said why these are proportion data in block 6 of the submitted notebook.”). Focus on laying out the facts, and nothing else. Regrade requests should be submitted through Gradescope.

Finally, it is also your responsibility to make these requests in a timely manner. Requests to review grading mistakes must be submitted no later than one week after a graded piece of work is returned to you. After that time, we will not consider any such requests, regardless of whether the request is reasonable and justified.

We will not accept any request to review grading after Thursday May 25 because grades are due soon after; this may limit grade disputes for the project.

Software tools

Most work in this class will be conducted using computational notebooks and other text files. We will demonstrate how to work with computational notebooks in Google Colab, but students may choose to use a different text editor if they want. Similarly, we will cover a handful of Python libraries, but students may choose to use other libraries. We may also introduce free graphics editing software such as Figma. Students not using the tools we present are responsible for learning to use and troubleshoot those other software tools.

Code of conduct

Diversity and Inclusion: Students are expected to treat each other with respect and to give due consideration to each other's stances and positions. Discrimination of any kind will not be tolerated, along the lines of gender identity, sexuality, disability, generational status, socioeconomic status, ethnicity, race, religion, national origin, culture, or otherwise. Please see the UChicago Commitment on Diversity. Additionally, we are committed to providing equitable access to education at UChicago. Students who have been approved for academic accommodations through Student Disability Services (SDS) should follow the procedures established by SDS for using accommodations. Regardless of identity or status with SDS, students with concerns or questions about issues of diversity and inclusion should email the instructor.

Academic Integrity: We take academic honesty very seriously in this class. This means students must not collaborate or copy from outside sources on non-collaborative work such as assignments and the project. Collaboration is allowed on exercises only, but you must turn in your own copy of the work. Students are not permitted to use automated assistants such ChatGPT or GitHub CoPilot for coursework; use of these technologies in this class is considered an academic honesty violation. Additionally, students should be aware of the UChicago policy on Academic Honesty and Plagiarism. The gist of academic integrity is that you should always do and submit your own work.

Sexual Misconduct: Title IX prohibits discrimination on the basis of sex, including sexual assault, sexual abuse, sexual harassment, dating violence, domestic violence, and stalking. Sexual misconduct is completely unacceptable at UChicago (and anywhere else), including any interactions that occur related to this course. For related resources, please see the UChicago website about Title IX and Sexual Misconduct. Students seeking help or guidance related to an incident of sexual misconduct will be supported. Students should be aware that, in certain situations, the University may have an institutional obligation to respond to a report of sexual misconduct and that, as a faculty member, your instructor is required by Title IX and the University of Chicago to report incidents of sexual misconduct, even if students request to keep the information confidential. If you would like to speak to someone confidentially about an incident of sexual misconduct, please see the University's Confidential Resources.

Student Health and Wellness: Student's mental and physical health are of primary importance both for reasons of common human dignity and to create a suitable learning environment at UChicago. If you or someone you know needs or might benefit from mental health services, please consider reaching out to UChicago Student Wellness, whose services do not come at any additional cost to students.

If you are sick, please do not come to class or in-person office hours. If you need to miss class because of an illness, please email your instructor. Students who have been exposed to or who are experiencing symptoms of COVID-19 should contact UChicago Student Wellness to be tested. If you were potentially exposed to COVID-19 or your COVID-19 test results come back positive, please reach out immediately to C19HealthReport@uchicago.edu. Other public health concerns should be directed to UCAIR. If there is an emergency, please call 773-702-8181 or dial 123 on any campus phone, or call 911 for emergency response.

Attendance and Participation: DATA 22700 is 100% in-person. If students need to miss class for some reason, they should ask their peers to share notes from the missed class. We will not record lectures, however, we are happy to discuss what was missed during office hours. Please do not email the instructional staff with requests for a summary of missed lectures; we will tell you to ask a peer for notes and/or visit us at office hours. Students who miss an exercise are still expected to turn that exercise in for credit.

About

This repository serves as the primary course website for DATA 22700 at UChicago in Spring quarter 2023. Students should refer to this page for the course syllabus, schedule, and assigned coursework.


Languages

Language:Jupyter Notebook 100.0%