The Art and Science of Empirical Computer Science

⁉️ This is the course website for the Fall 2023 edition of my course. I previously taught the course in Fall 2022. I will populate this website with content throughout the semester, but if you'd like to look ahead to see what's in store, you're welcome to consult the website from last year; content will be similar, but not identical.

Logistics

Semester: Fall 2023
Instructor: Jimmy Lin
Time & Location: Mondays 12:30pm-3:20pm, DC 2585

Course Description

Graduate students in computer science aspire to "do computer science" (research), but what exactly does that mean? It involves, among a multitude of activities, reading papers, learning the "state of the art", advancing knowledge, writing papers, and (hopefully) getting them published. Graduate students learn how to do these things under tutelage of professors, but rarely is there explicit or deliberate instruction on these myriad activities. With a focus on empirical computer science, this course covers elements that comprise the research enterprise, synthesizing both "art" — personal experiences I have accumulated over the years — as well as "science" — insights derived from quantitative analyses. The hope is that knowledge and actionable advice from this course will help graduate students better understand research, hopefully leading to more productive and fulfilling careers.

Material for this course will draw from "The Science of Science" by Wang and Barabási, academic papers, as well as other sources on the web.

Scope

Context is important. Most of the questions and issues we grapple with in this course have no simple answer. Nearly always, it depends on context. As such, it is important to properly scope the coverage of this course.

Wang and Barabási attempt to paint broad strokes with their work, encompassing all of science. However, some of their findings and recommendations may seem at odds with realities in computer science. Much of the "art" (e.g., advice, best practices, etc.) covered in this course is drawn from my personal experience, which will of course be colored by my own background. I am a computer scientist (actually, also a formal linguist) by training, and I work presently at the intersection of natural language processing (NLP) and information retrieval (IR), although over the years I have dabbled in other sub-disciplines of computer science as well.

For lack of a better term, I characterize the focus of this course as "empirical computer science", but it really is a shorthand for "stuff that I have worked on" and "stuff that I am familiar with". NLP and IR can be characterized as "applied machine learning", so perhaps that's a more accurate scope. The contents of this course will certainly be relevant to graduate students wishing to pursue these topics, and I suspect for related sub-disciplines in computer science such as data mining, or even perhaps computer vision (although I don't work in those fields). However, I am quite certain that portions of this course will not apply to, for example, theoretical machine learning and complexity. All findings, advice, recommendations, etc. need to be properly contextualized.

Syllabus

Week	Date	Type	Description	Debate	Slides
1	9/11	-	Introduction		PDF
2	9/18	"science"	The Science of Career	Topic 1	PDF
3	9/25	"science"	The Science of Collaboration (I)	Topic 2	PDF
4	10/2	"science"	The Science of Collaboration (II)	Topic 3	PDF
5	10/16	-	Presentation of Visualization Projects
6	10/23	"science"	The Science of Impact	Topic 4	PDF
7	10/30	"art"	Research as a Social Process		PDF
8	11/6	"art"	On Working With Your Advisor		PDF
9	11/13	"art"	On Writing Papers		PDF
10	11/20	"art"	On Responsible Research		PDF
11	11/27	"science"	Paper Presentations (I)
12	12/4	"science"	Paper Presentations (II)

Assignments and Grades

Weight	Component	Deadlines
15%	Debate participation	Weeks 2, 3, 4, 6
20%	Visualization project	Week 5
15%	Paper presentation	Weeks 11, 12
40%	Final project	End of semester
10%	Class participation

It is expected that you come to class prepared, having completed the assigned readings and ready to engage in class discussions.

Detailed Schedule

Week 1: Introduction

Slides: [PDF]

For more details on normative vs. positive approaches, Wikipedia provides a good starting point: Positivism and Normativity.

Week 2: The Science of Career

Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási:

Introduction
Part 1: The Science of Career — Chapters 1-7 (inclusive)

Slides: [PDF]

We'll be having our debate on Topic 1: How should we evaluate excellence? Quality only or quality and quantity?

Position A: Researchers should be evaluated solely on the quality of their publications. Quantity is irrelevant and we shouldn't even bother counting.
Position B: Researchers should be evaluated on both the quality and quantity of their publications. High-quality publications are of course important, but quantity is also an important component of excellence.

Week 3: The Science of Collaboration (I)

Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási:

Part 2: The Science of Collaboration — Chapters 8-11 (inclusive)

Slides: [PDF]

We'll be having our debate on Topic 2: Should you collaborate or not?

Position A: Early-stage researchers should actively seek out collaborations beyond their research group. Participation in multiple research projects across many different groups builds breadth.
Position B: Early-stage researchers should not actively seek out collaborations beyond their research group. Focusing on depth is more important than breadth.

Paper discussed in class: A causal test of the strength of weak ties

Week 4: The Science of Collaboration (II)

Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási:

Part 2: The Science of Collaboration — Chapters 12-14 (inclusive)

Slides: [PDF]

We'll be having our debate on Topic 3: How should you approach open-sourcing computational artifacts associated with your work?

Position A: Early-stage researchers should do the minimal in open-sourcing computational artifacts that arise from their work. Doing anything more than the community norm is a waste of time and effort that could be better spent writing more papers.
Position B: Early-stage researchers should actively promote the adoption of computational artifacts that arise from their work, for example, contributing to popular open-source libraries. Even if this requires a lot of time (e.g., refactoring code into a production-ready state), such efforts are worthwhile.

Interesting tweet thread:

Who should be the last/senior author on a paper? How do you decide? What does being last entail? I get these questions a lot and it’s confusing because the last author is often a senior person, running a group & raising money. Do those things determine last authorship? No. (1/7)
— Michael Black (@Michael_J_Black) November 1, 2022

Week 5: Presentation of Visualization Projects

Presentation of visualization projects!

Week 6: The Science of Impact

Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási:

Part 3: The Science of Impact — Chapters 15-20 (inclusive)

Slides: [PDF]

We'll be having our debate on Topic 4: Is social media a waste of time?

Position A: Early-stage researchers should actively incorporate social media use as a component of their career development. This means appropriate use of sites like Twitter, Facebook, and LinkedIn to build professional reputation, engage with the community, hear about recent work by others, etc.
Position B: Early-stage researchers should stay off social media. It's a complete waste of time.

Week 7: Research as a Social Process

Slides: [PDF]

Links to content discussed in class:

Supplemental readings:

Becerra et al. Maximizing the Conference Experience: Tips to Effectively Navigate Academic Conferences Early in Professional Careers., Behavior Analysis in Practice, 13(3):479-491, 2020.
Leininger et al. Ten Simple Rules for Attending Your First Conference. PLoS Computational Biology, 17(7):e1009133, 2021.

Some review horror stories:

Just got a desk reject, post-rebuttals, for a paper being submitted to arxiv <30 min late for the anonymity deadline. I talk about how the ACL embargo policy hurts junior researchers and makes ACL venues less desirable for NLP work. I don’t talk about the pointless NOISE it adds.
— Naomi Saphra 🟣 (@nsaphra) September 4, 2023

I AM SO ANGRY. I won't submit to ACL venues again after they shafted a student after rebuttals with this idiotic policy. Since anonymity is gone, though, publicity time! Check out awesome work by @ZackAnkner on improving MLM training by scheduling masking: https://t.co/fQ4SVLgFs9 https://t.co/jSUJo4cswf
— Jonathan Frankle (@jefrankle) September 4, 2023

from #ICLR2023 pic.twitter.com/5QpGAv0xUs
— Rob Tang (@XiangruTang) November 7, 2022

Our Meta OT paper was rejected from @NeurIPS despite having WA/A/SA throughout the discussion period. How did it happen? The AC based the decision on a review that came a month late that we couldn't even respond to.

We've made the full discussion public:https://t.co/ISx9afxhyA pic.twitter.com/lBXqPWM6KL
— Brandon Amos (@brandondamos) November 7, 2022

Week 8: On Working With Your Advisor

Slides: [PDF]

Links to content discussed in class:

Interesting tweet threads related to class discussion:

How to work with your advisor(s)?

Working effectively with your advisor is the no doubt the key to success of your research! However, junior grad students often don't have a clear idea on how to do so.

Sharing some tips that I found useful. 👇
— Jia-Bin Huang (@jbhuang0604) July 11, 2022

How to come up with research ideas?

Excited about starting doing research but have no clue?🤷‍♂️🤷🏻‍♀️ Here are some simple methods that I found useful in identifying initial directions.

Check out the thread below 👇
— Jia-Bin Huang (@jbhuang0604) August 6, 2021

If founders don’t sleep, bad stuff happens, in 4 studies:
1) Lack of 💤makes you generate worse ideas
2) Lack of 💤 makes you think the bad ideas you develop are good
3) Getting 💤 boosts your mood, upping the mood of your startup
4) Lack of 💤 lowers your entrepreneurial ability pic.twitter.com/pqPOucy1rp
— Ethan Mollick (@emollick) November 7, 2022

Week 9: On Writing Papers

Slides: [PDF]

Papers used in the abstract analysis exercise:

Nature abstract template
Deng et al. ImageNet: A Large-Scale Hierarchical Image Database. CVPR 2009. (59k citations)
Krizhevsky et al. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012. (122k citations)
Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019. (83k citations)
Radford et al. Improving Language Understanding by Generative Pre-Training. (7k citations)
Peters et al. Deep Contextualized Word Representations. NAACL 2018. (14k citations)
Vaswani et al. Attention Is All You Need. NIPS 2017. (97k citations)

Links to content discussed in class:

Writing Is Thinking
Mensh and Kording. Ten simple rules for structuring papers PLoS Computational Biology, 13(9):e1005619, 2017.
Baquero. Picking Publication Targets. CACM, 65(3):10-11, 2022.
My writing pet peeves

Week 10: On Responsible Research

Slides: [PDF]

Links to content discussed in class:

Distributive Justice: entry from Stanford Encyclopedia of Philosophy.
Dressel et al. The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1), 2018.
Wen et al. Characteristics of publicly available skin cancer image datasets: a systematic review. The Lancet, 4(1), E64-E74, 2022.
Friedler et al. The (Im)possibility of fairness: different value systems require different mechanisms for fair decision making. ACM, 64(4)136-143, 2021.
Ghassemi et al. The false hope of current approaches to explainable artificial intelligence in health care. The Lancet, 3(11):E745-E750, 2021.

Week 11: Paper Presentation (I)

🚧 Content to be added!

Week 12: Paper Presentation (II)

🚧 Content to be added!

lintool / art-science-empirical-cs-2023f

The Art and Science of Empirical Computer Science

Logistics

Course Description

Scope

Syllabus

Assignments and Grades

Detailed Schedule

Week 1: Introduction

Week 2: The Science of Career

Week 3: The Science of Collaboration (I)

Week 4: The Science of Collaboration (II)

Week 5: Presentation of Visualization Projects

Week 6: The Science of Impact

Week 7: Research as a Social Process

Week 8: On Working With Your Advisor

Week 9: On Writing Papers

Week 10: On Responsible Research

Week 11: Paper Presentation (I)

Week 12: Paper Presentation (II)

About