The Art and Science of Empirical Computer Science

⁉️ This is the course website for the Fall 2022 edition of my course, which has concluded. Do you really mean to be here? Or do you really want to visit the Fall 2023 edition?

Logistics

Semester: Fall 2022
Instructor: Jimmy Lin
Time & Location: Mondays 12:30-02:50, DC 2568

Course Description

Graduate students in computer science aspire to "do computer science" (research), but what exactly does that mean? It involves, among a multitude of activities, reading papers, learning the "state of the art", advancing knowledge, writing papers, and (hopefully) getting them published. Graduate students learn how to do these things under tutelage of professors, but rarely is there explicit or deliberate instruction on these myriad activities. With a focus on empirical computer science, this course covers elements that comprise the research enterprise, synthesizing both "art" — personal experiences I have accumulated over the years — as well as "science" — insights derived from quantitative analyses. The hope is that knowledge and actionable advice from this course will help graduate students better understand research, hopefully leading to more productive and fulfilling careers.

Material for this course will draw from "The Science of Science" by Wang and Barabási, academic papers, as well as other sources on the web.

Scope

Context is important. Most of the questions and issues we grapple with in this course have no simple answer. Nearly always, it depends on context. As such, it is important to properly scope the coverage of this course.

Wang and Barabási attempt to paint broad strokes with their work, encompassing all of science. However, some of their findings and recommendations may seem at odds with realities in computer science. Much of the "art" (e.g., advice, best practices, etc.) covered in this course is drawn from my personal experience, which will of course be colored by my own background. I am a computer scientist (actually, also a formal linguist) by training, and I work presently at the intersection of natural language processing (NLP) and information retrieval (IR), although over the years I have dabbled in other sub-disciplines of computer science as well.

For lack of a better term, I characterize the focus of this course as "empirical computer science", but it really is a shorthand for "stuff that I have worked on" and "stuff that I am familiar with". NLP and IR can be characterized as "applied machine learning", so perhaps that's a more accurate scope. The contents of this course will certainly be relevant to graduate students wishing to pursue these topics, and I suspect for related sub-disciplines in computer science such as data mining, or even perhaps computer vision (although I don't work in those fields). However, I am quite certain that portions of this course will not apply to, for example, theoretical machine learning and complexity. All findings, advice, recommendations, etc. need to be properly contextualized.

Syllabus

Week	Date	Type	Description
1	9/12	-	Introduction [Slides]
2	9/19	"science"	The Science of Career [Slides]
3	9/26	"science"	The Science of Collaboration [Slides]
4	10/3	"science"	The Science of Impact [Slides]
5	10/17	-	Presentation of Visualization Projects
6	10/24	"science"	The Science of Impact (Still) [Slides]
7	10/31	"art"	Research as a Social Process [Slides]
8	11/7	"art"	Working With Your Advisor [Slides]
9	11/14	"art"	On Writing Papers [Slides]
10	11/21	"art"	Responsible Research [Slides]
11	11/28	"science"	Paper Presentations (I)
12	12/5	"science"	Paper Presentations (II)

Grades

Weight	Component
15%	Debate participation
15%	Paper presentation
20%	Visualization project
40%	Final project
10%	Class participation

Assignments

In addition to weekly preparation (readings and other material), the course will have the following assignments:

Preparation and participation in a debate. These debates will be scattered throughout the semester, where the debate topics will be complementary to the topic of that week.
Presentation of a paper on meta-research (Weeks 11 and 12).
Visualization project due in mid-October.
Final project due at the end of the semester.

Detailed Schedule

Week 1: Introduction

Slides: [PDF]

For more details on normative vs. positive approaches, Wikipedia provides a good starting point: Positivism and Normativity.

Week 2: The Science of Career

Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási, Part 1: The Science of Career (pages 5-80).

Slides: [PDF]

We'll be having our debate on Topic 1: How should we evaluate excellence? Quality only or quality and quantity?

Position A: Researchers should be evaluated solely on the quality of their publications. Quantity is irrelevant and we shouldn't even bother counting.
Position B: Researchers should be evaluated on both the quality and quantity of their publications. High-quality publications are of course important, but quantity is also an important component of excellence.

Week 3: The Science of Collaboration

Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási, Part 2: The Science of Collaboration (pages 81-158).

Slides: [PDF]

We'll be having our debate on Topic 2: Should you collaborate or not?

Position A: Early-stage researchers should actively seek out collaborations beyond their research group. Participation in multiple research projects across many different groups builds breadth.
Position B: Early-stage researchers should not actively seek out collaborations beyond their research group. Focusing on depth is more important than breadth.

Week 4: The Science of Impact

Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási, Part 1: The Science of Impact (pages 159-219).

Slides: [PDF]

We'll be having our debate on Topic 3: How should you approach open-sourcing computational artifacts associated with your work?

Position A: Early-stage researchers should do the minimal in open-sourcing computational artifacts that arise from their work. Doing anything more than the community norm is a waste of time and effort that could be better spent writing more papers.
Position B: Early-stage researchers should actively promote the adoption of computational artifacts that arise from their work, for example, contributing to popular open-source libraries. Even if this requires a lot of time (e.g., refactoring code into a production-ready state), such efforts are worthwhile.

Week 5: Presentation of Visualization Projects

Presentation of visualization projects!

Week 6: The Science of Impact (Still)

Slides: [PDF]

We'll be having our debate on Topic 4: Is social media a waste of time?

Position A: Early-stage researchers should actively incorporate social media use as a component of their career development. This means appropriate use of sites like Twitter, Facebook, and LinkedIn to build professional reputation, engage with the community, hear about recent work by others, etc.
Position B: Early-stage researchers should stay off social media. It's a complete waste of time.

Links to the case studies of impact that we discussed in class:

Week 7: Research as a Social Process

Slides: [PDF]

Links to content discussed in class:

Supplemental readings:

Becerra et al. Maximizing the Conference Experience: Tips to Effectively Navigate Academic Conferences Early in Professional Careers., Behavior Analysis in Practice, 13(3):479-491, 2020.
Leininger et al. Ten Simple Rules for Attending Your First Conference. PLoS Computational Biology, 17(7):e1009133, 2021.

Week 8: Working With Your Advisor

Slides: [PDF]

Links to content discussed in class:

Week 9: On Writing Papers

Slides: [PDF]

Papers used in the abstract analysis exercise:

Deng et al. ImageNet: A Large-Scale Hierarchical Image Database. CVPR 2009. (45k citations)
Krizhevsky et al. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012. (119k citations)
Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019. (53k citations)
Radford et al. Improving Language Understanding by Generative Pre-Training. (4k citations)
Peters et al. Deep Contextualized Word Representations. NAACL 2018. (11k citations)

Links to content discussed in class:

Writing Is Thinking
Mensh and Kording. Ten simple rules for structuring papers PLoS Computational Biology, 13(9):e1005619, 2017.
Baquero. Picking Publication Targets. CACM, 65(3):10-11, 2022.
My writing pet peeves

Week 10: Responsible Research

Slides: [PDF]

Links to content discussed in class:

Distributive Justice: entry from Stanford Encyclopedia of Philosophy.
Dressel et al. The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1), 2018.
Friedler et al. The (Im)possibility of fairness: different value systems require different mechanisms for fair decision making. ACM, 64(4)136-143, 2021.
Ghassemi et al. The false hope of current approaches to explainable artificial intelligence in health care. The Lancet, 3(11):E745-E750.

lintool / art-science-empirical-cs-2022f

The Art and Science of Empirical Computer Science

Logistics

Course Description

Scope

Syllabus

Grades

Assignments

Detailed Schedule

Week 1: Introduction

Week 2: The Science of Career

Week 3: The Science of Collaboration

Week 4: The Science of Impact

Week 5: Presentation of Visualization Projects

Week 6: The Science of Impact (Still)

Week 7: Research as a Social Process

Week 8: Working With Your Advisor

Week 9: On Writing Papers

Week 10: Responsible Research

Week 11: Paper Presentation (I)

Week 12: Paper Presentation (II)

About