lintool / art-science-empirical-cs-2022f

The Art and Science of Empirical Computer Science (Fall 2022)

Home Page:https://github.com/lintool/art-science-empirical-cs-2022f

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The Art and Science of Empirical Computer Science

⁉️ This is the course website for the Fall 2022 edition of my course, which has concluded. Do you really mean to be here? Or do you really want to visit the Fall 2023 edition?

Logistics

  • Semester: Fall 2022
  • Instructor: Jimmy Lin
  • Time & Location: Mondays 12:30-02:50, DC 2568

Course Description

Graduate students in computer science aspire to "do computer science" (research), but what exactly does that mean? It involves, among a multitude of activities, reading papers, learning the "state of the art", advancing knowledge, writing papers, and (hopefully) getting them published. Graduate students learn how to do these things under tutelage of professors, but rarely is there explicit or deliberate instruction on these myriad activities. With a focus on empirical computer science, this course covers elements that comprise the research enterprise, synthesizing both "art" — personal experiences I have accumulated over the years — as well as "science" — insights derived from quantitative analyses. The hope is that knowledge and actionable advice from this course will help graduate students better understand research, hopefully leading to more productive and fulfilling careers.

Material for this course will draw from "The Science of Science" by Wang and Barabási, academic papers, as well as other sources on the web.

Scope

Context is important. Most of the questions and issues we grapple with in this course have no simple answer. Nearly always, it depends on context. As such, it is important to properly scope the coverage of this course.

Wang and Barabási attempt to paint broad strokes with their work, encompassing all of science. However, some of their findings and recommendations may seem at odds with realities in computer science. Much of the "art" (e.g., advice, best practices, etc.) covered in this course is drawn from my personal experience, which will of course be colored by my own background. I am a computer scientist (actually, also a formal linguist) by training, and I work presently at the intersection of natural language processing (NLP) and information retrieval (IR), although over the years I have dabbled in other sub-disciplines of computer science as well.

For lack of a better term, I characterize the focus of this course as "empirical computer science", but it really is a shorthand for "stuff that I have worked on" and "stuff that I am familiar with". NLP and IR can be characterized as "applied machine learning", so perhaps that's a more accurate scope. The contents of this course will certainly be relevant to graduate students wishing to pursue these topics, and I suspect for related sub-disciplines in computer science such as data mining, or even perhaps computer vision (although I don't work in those fields). However, I am quite certain that portions of this course will not apply to, for example, theoretical machine learning and complexity. All findings, advice, recommendations, etc. need to be properly contextualized.

Syllabus

Week Date Type Description
1 9/12 - Introduction [Slides]
2 9/19 "science" The Science of Career [Slides]
3 9/26 "science" The Science of Collaboration [Slides]
4 10/3 "science" The Science of Impact [Slides]
5 10/17 - Presentation of Visualization Projects
6 10/24 "science" The Science of Impact (Still) [Slides]
7 10/31 "art" Research as a Social Process [Slides]
8 11/7 "art" Working With Your Advisor [Slides]
9 11/14 "art" On Writing Papers [Slides]
10 11/21 "art" Responsible Research [Slides]
11 11/28 "science" Paper Presentations (I)
12 12/5 "science" Paper Presentations (II)

Grades

Weight Component
15% Debate participation
15% Paper presentation
20% Visualization project
40% Final project
10% Class participation

Assignments

In addition to weekly preparation (readings and other material), the course will have the following assignments:

Detailed Schedule

Week 1: Introduction

Slides: [PDF]

For more details on normative vs. positive approaches, Wikipedia provides a good starting point: Positivism and Normativity.

Week 2: The Science of Career

Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási, Part 1: The Science of Career (pages 5-80).

Slides: [PDF]

We'll be having our debate on Topic 1: How should we evaluate excellence? Quality only or quality and quantity?

  • Position A: Researchers should be evaluated solely on the quality of their publications. Quantity is irrelevant and we shouldn't even bother counting.
  • Position B: Researchers should be evaluated on both the quality and quantity of their publications. High-quality publications are of course important, but quantity is also an important component of excellence.

Week 3: The Science of Collaboration

Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási, Part 2: The Science of Collaboration (pages 81-158).

Slides: [PDF]

We'll be having our debate on Topic 2: Should you collaborate or not?

  • Position A: Early-stage researchers should actively seek out collaborations beyond their research group. Participation in multiple research projects across many different groups builds breadth.
  • Position B: Early-stage researchers should not actively seek out collaborations beyond their research group. Focusing on depth is more important than breadth.

Week 4: The Science of Impact

Readings (to be completed prior to the class session): "The Science of Science" by Wang and Barabási, Part 1: The Science of Impact (pages 159-219).

Slides: [PDF]

We'll be having our debate on Topic 3: How should you approach open-sourcing computational artifacts associated with your work?

  • Position A: Early-stage researchers should do the minimal in open-sourcing computational artifacts that arise from their work. Doing anything more than the community norm is a waste of time and effort that could be better spent writing more papers.
  • Position B: Early-stage researchers should actively promote the adoption of computational artifacts that arise from their work, for example, contributing to popular open-source libraries. Even if this requires a lot of time (e.g., refactoring code into a production-ready state), such efforts are worthwhile.

Week 5: Presentation of Visualization Projects

Presentation of visualization projects!

Week 6: The Science of Impact (Still)

Slides: [PDF]

We'll be having our debate on Topic 4: Is social media a waste of time?

  • Position A: Early-stage researchers should actively incorporate social media use as a component of their career development. This means appropriate use of sites like Twitter, Facebook, and LinkedIn to build professional reputation, engage with the community, hear about recent work by others, etc.
  • Position B: Early-stage researchers should stay off social media. It's a complete waste of time.

Links to the case studies of impact that we discussed in class:

Week 7: Research as a Social Process

Slides: [PDF]

Links to content discussed in class:

Supplemental readings:

Week 8: Working With Your Advisor

Slides: [PDF]

Links to content discussed in class:

Week 9: On Writing Papers

Slides: [PDF]

Papers used in the abstract analysis exercise:

Links to content discussed in class:

Week 10: Responsible Research

Slides: [PDF]

Links to content discussed in class:

Week 11: Paper Presentation (I)

Week 12: Paper Presentation (II)

About

The Art and Science of Empirical Computer Science (Fall 2022)

https://github.com/lintool/art-science-empirical-cs-2022f