The Art and Science of Empirical Computer Science
Logistics
- Semester: Fall 2023
- Instructor: Jimmy Lin
- Time & Location: Mondays 12:30pm-3:20pm, DC 2585
Course Description
Graduate students in computer science aspire to "do computer science" (research), but what exactly does that mean? It involves, among a multitude of activities, reading papers, learning the "state of the art", advancing knowledge, writing papers, and (hopefully) getting them published. Graduate students learn how to do these things under tutelage of professors, but rarely is there explicit or deliberate instruction on these myriad activities. With a focus on empirical computer science, this course covers elements that comprise the research enterprise, synthesizing both "art" — personal experiences I have accumulated over the years — as well as "science" — insights derived from quantitative analyses. The hope is that knowledge and actionable advice from this course will help graduate students better understand research, hopefully leading to more productive and fulfilling careers.
Material for this course will draw from "The Science of Science" by Wang and Barabási, academic papers, as well as other sources on the web.
Scope
Context is important. Most of the questions and issues we grapple with in this course have no simple answer. Nearly always, it depends on context. As such, it is important to properly scope the coverage of this course.
Wang and Barabási attempt to paint broad strokes with their work, encompassing all of science. However, some of their findings and recommendations may seem at odds with realities in computer science. Much of the "art" (e.g., advice, best practices, etc.) covered in this course is drawn from my personal experience, which will of course be colored by my own background. I am a computer scientist (actually, also a formal linguist) by training, and I work presently at the intersection of natural language processing (NLP) and information retrieval (IR), although over the years I have dabbled in other sub-disciplines of computer science as well.
For lack of a better term, I characterize the focus of this course as "empirical computer science", but it really is a shorthand for "stuff that I have worked on" and "stuff that I am familiar with". NLP and IR can be characterized as "applied machine learning", so perhaps that's a more accurate scope. The contents of this course will certainly be relevant to graduate students wishing to pursue these topics, and I suspect for related sub-disciplines in computer science such as data mining, or even perhaps computer vision (although I don't work in those fields). However, I am quite certain that portions of this course will not apply to, for example, theoretical machine learning and complexity. All findings, advice, recommendations, etc. need to be properly contextualized.
Syllabus
Week | Date | Type | Description | Debate | Slides |
---|---|---|---|---|---|
1 | 9/11 | - | Introduction | ||
2 | 9/18 | "science" | The Science of Career | Topic 1 | |
3 | 9/25 | "science" | The Science of Collaboration (I) | Topic 2 | |
4 | 10/2 | "science" | The Science of Collaboration (II) | Topic 3 | |
5 | 10/16 | - | Presentation of Visualization Projects | ||
6 | 10/23 | "science" | The Science of Impact | Topic 4 | |
7 | 10/30 | "art" | Research as a Social Process | ||
8 | 11/6 | "art" | On Working With Your Advisor | ||
9 | 11/13 | "art" | On Writing Papers | ||
10 | 11/20 | "art" | On Responsible Research | ||
11 | 11/27 | "science" | Paper Presentations (I) | ||
12 | 12/4 | "science" | Paper Presentations (II) |
Assignments and Grades
Weight | Component | Deadlines |
---|---|---|
15% | Debate participation | Weeks 2, 3, 4, 6 |
20% | Visualization project | Week 5 |
15% | Paper presentation | Weeks 11, 12 |
40% | Final project | End of semester |
10% | Class participation |
It is expected that you come to class prepared, having completed the assigned readings and ready to engage in class discussions.
Detailed Schedule
Week 1: Introduction
Slides: [PDF]
For more details on normative vs. positive approaches, Wikipedia provides a good starting point: Positivism and Normativity.
Week 2: The Science of Career
Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási:
- Introduction
- Part 1: The Science of Career — Chapters 1-7 (inclusive)
Slides: [PDF]
We'll be having our debate on Topic 1: How should we evaluate excellence? Quality only or quality and quantity?
- Position A: Researchers should be evaluated solely on the quality of their publications. Quantity is irrelevant and we shouldn't even bother counting.
- Position B: Researchers should be evaluated on both the quality and quantity of their publications. High-quality publications are of course important, but quantity is also an important component of excellence.
Week 3: The Science of Collaboration (I)
Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási:
- Part 2: The Science of Collaboration — Chapters 8-11 (inclusive)
Slides: [PDF]
We'll be having our debate on Topic 2: Should you collaborate or not?
- Position A: Early-stage researchers should actively seek out collaborations beyond their research group. Participation in multiple research projects across many different groups builds breadth.
- Position B: Early-stage researchers should not actively seek out collaborations beyond their research group. Focusing on depth is more important than breadth.
Paper discussed in class: A causal test of the strength of weak ties
Week 4: The Science of Collaboration (II)
Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási:
- Part 2: The Science of Collaboration — Chapters 12-14 (inclusive)
Slides: [PDF]
We'll be having our debate on Topic 3: How should you approach open-sourcing computational artifacts associated with your work?
- Position A: Early-stage researchers should do the minimal in open-sourcing computational artifacts that arise from their work. Doing anything more than the community norm is a waste of time and effort that could be better spent writing more papers.
- Position B: Early-stage researchers should actively promote the adoption of computational artifacts that arise from their work, for example, contributing to popular open-source libraries. Even if this requires a lot of time (e.g., refactoring code into a production-ready state), such efforts are worthwhile.
Interesting tweet thread:
Who should be the last/senior author on a paper? How do you decide? What does being last entail? I get these questions a lot and it’s confusing because the last author is often a senior person, running a group & raising money. Do those things determine last authorship? No. (1/7)
— Michael Black (@Michael_J_Black) November 1, 2022
Week 5: Presentation of Visualization Projects
Presentation of visualization projects!
Week 6: The Science of Impact
Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási:
- Part 3: The Science of Impact — Chapters 15-20 (inclusive)
Slides: [PDF]
We'll be having our debate on Topic 4: Is social media a waste of time?
- Position A: Early-stage researchers should actively incorporate social media use as a component of their career development. This means appropriate use of sites like Twitter, Facebook, and LinkedIn to build professional reputation, engage with the community, hear about recent work by others, etc.
- Position B: Early-stage researchers should stay off social media. It's a complete waste of time.
Week 7: Research as a Social Process
Slides: [PDF]
Links to content discussed in class:
Supplemental readings:
- Becerra et al. Maximizing the Conference Experience: Tips to Effectively Navigate Academic Conferences Early in Professional Careers., Behavior Analysis in Practice, 13(3):479-491, 2020.
- Leininger et al. Ten Simple Rules for Attending Your First Conference. PLoS Computational Biology, 17(7):e1009133, 2021.
Some review horror stories:
Just got a desk reject, post-rebuttals, for a paper being submitted to arxiv <30 min late for the anonymity deadline. I talk about how the ACL embargo policy hurts junior researchers and makes ACL venues less desirable for NLP work. I don’t talk about the pointless NOISE it adds.
— Naomi Saphra 🟣 (@nsaphra) September 4, 2023
I AM SO ANGRY. I won't submit to ACL venues again after they shafted a student after rebuttals with this idiotic policy. Since anonymity is gone, though, publicity time! Check out awesome work by @ZackAnkner on improving MLM training by scheduling masking: https://t.co/fQ4SVLgFs9 https://t.co/jSUJo4cswf
— Jonathan Frankle (@jefrankle) September 4, 2023
from #ICLR2023 pic.twitter.com/5QpGAv0xUs
— Rob Tang (@XiangruTang) November 7, 2022
Our Meta OT paper was rejected from @NeurIPS despite having WA/A/SA throughout the discussion period. How did it happen? The AC based the decision on a review that came a month late that we couldn't even respond to.
— Brandon Amos (@brandondamos) November 7, 2022
We've made the full discussion public:https://t.co/ISx9afxhyA pic.twitter.com/lBXqPWM6KL
Week 8: On Working With Your Advisor
Slides: [PDF]
Links to content discussed in class:
Interesting tweet threads related to class discussion:
How to work with your advisor(s)?
— Jia-Bin Huang (@jbhuang0604) July 11, 2022
Working effectively with your advisor is the no doubt the key to success of your research! However, junior grad students often don't have a clear idea on how to do so.
Sharing some tips that I found useful. 👇
How to come up with research ideas?
— Jia-Bin Huang (@jbhuang0604) August 6, 2021
Excited about starting doing research but have no clue?🤷♂️🤷🏻♀️ Here are some simple methods that I found useful in identifying initial directions.
Check out the thread below 👇
If founders don’t sleep, bad stuff happens, in 4 studies:
— Ethan Mollick (@emollick) November 7, 2022
1) Lack of 💤makes you generate worse ideas
2) Lack of 💤 makes you think the bad ideas you develop are good
3) Getting 💤 boosts your mood, upping the mood of your startup
4) Lack of 💤 lowers your entrepreneurial ability pic.twitter.com/pqPOucy1rp
Week 9: On Writing Papers
Slides: [PDF]
Papers used in the abstract analysis exercise:
- Nature abstract template
- Deng et al. ImageNet: A Large-Scale Hierarchical Image Database. CVPR 2009. (59k citations)
- Krizhevsky et al. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012. (122k citations)
- Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019. (83k citations)
- Radford et al. Improving Language Understanding by Generative Pre-Training. (7k citations)
- Peters et al. Deep Contextualized Word Representations. NAACL 2018. (14k citations)
- Vaswani et al. Attention Is All You Need. NIPS 2017. (97k citations)
Links to content discussed in class:
- Writing Is Thinking
- Mensh and Kording. Ten simple rules for structuring papers PLoS Computational Biology, 13(9):e1005619, 2017.
- Baquero. Picking Publication Targets. CACM, 65(3):10-11, 2022.
- My writing pet peeves
Week 10: On Responsible Research
Slides: [PDF]
Links to content discussed in class:
- Distributive Justice: entry from Stanford Encyclopedia of Philosophy.
- Dressel et al. The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1), 2018.
- Wen et al. Characteristics of publicly available skin cancer image datasets: a systematic review. The Lancet, 4(1), E64-E74, 2022.
- Friedler et al. The (Im)possibility of fairness: different value systems require different mechanisms for fair decision making. ACM, 64(4)136-143, 2021.
- Ghassemi et al. The false hope of current approaches to explainable artificial intelligence in health care. The Lancet, 3(11):E745-E750, 2021.
Week 11: Paper Presentation (I)
🚧 Content to be added!
Week 12: Paper Presentation (II)
🚧 Content to be added!