linnabrown / Comp790-166-Comp-Bio

Computational Biology- Spring 2021

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Comp790-166: Computational Biology

Details

Instructor: Natalie Stanley

Email: natalies@cs.unc.edu

Time: Tuesday/Thursday 9:30am-10:45am, Spring Semester 2021

Office Hours: Thursday 11am-noon on our class zoom link and by appointment.

Location: Upate Jan 18: If you have not receieved my welcome email, please send me an email for the zoom link

Description

Moden high-throughput assays allow us to efficiently profile a variety biological processes at a systems-level across a set of patient samples. As a result, these technologies generate an abundance of detailed information that needs to be extracted, analyzed and interpreted. In this course we will discuss the methodology used to analyze (process, engineer features from, combine, etc.) data generated by some of the most cutting-edge technologies, such as proteomics, single-cell assays, and imaging in biomedicine. We will further discuss how numerical linear algebra techniques and modern machine learning approaches can be applied to effectively extract information from these assays for an improved understanding of human health and disease. While computational biology is a very broad field, we will focus here on the analysis of data generated by single-cell technologies (e.g. mass cytometry), multiomics/multi-modality analysis, systems immunology, and benchmarking. For each class of algorithms introduced for some task on biological data, we will also go over necessary theory and mathematical intuition.

Prerequistes

Strong programming. Comfortable with linear algebra and basic probabilty. Please do not worry if you don't have any background in biology. Any relevant concepts will be introduced. Please feel free to talk to me about any of these prerequistes.

Course Structure

This course will be mostly lecture-based with two homework assignments and a course project. I will provide ideas for several publicly available biological datasets and open problems for you to work on for these projects. Overall, the project is intended to give you an opportunity to implement/apply methodology discussed in the papers that we will discuss together. The final project writeup will also give you practice writing up results and communicating ideas. You are welcome to work on teams for this project.

Most of the lectures will be based around several papers. To benefit your own understanding, I will provide a set of questions that should be answered for one of the papers discussed in each lecture.

Schedule

Note that this is preliminary. Some topics may take (on average) 1 day longer than planned. I reserve the right to correct typos in the notes up to 1hr before our class meeting.

Date Topic Reading Notes Code
Jan 19, 2021 Intro, bioinformatics vs comp bio, challenges and modality-specific advancements [Systems Immunology, Just Getting Started] Lecture 1 Notes
Jan 21, 2021 Linear Algebra Review, Low Rank Approximations, Building graphs from data, Graph Laplacian [SLMP. pages 10-22], [Data Matrices + Low Rank], [Random Projection Trees],[LargeVis] no reading summary Lecture 2 Notes [LargeVis][graph tools for python]
January 26, 2021 Graph Partitioning [Module Detection Benchmarking in Biological Data], [BigClam]. for fun: [Stochastic Block Model + Single Cell], Lecture 3 Notes [SNAP], [Louvain], [Leiden], [graph-tool (SBM)].
January 28, 2021 Graph partitioning (overflow slides), Graph Embeddings, Graph Signal Procssing [Node2Vec], [Representation Learning on Graphs], [Review: Graph Embedding in Comp Bio], for fun: [Review on GSP], [Low Pass Filtering on Graphs], [Vicus], [Mashup] Lecture 4 Notes [node2vec]
February 2, 2021 Single Cell Day 1: Intro to single-cell profiling, mass cytometry bioinformatics [Single-Cells, Many Features], [Spade] Lecture 5 Notes [FCS file tutorial], [Spade]
February 4, 2021 : [HW 1 Assigned] Single Cell Day 2: Graph-based automated gating, imputation in single-cell data, branch-point preserving visualization [phenograph], [PHATE], [MAGIC] Lecture 6 Notes [phenograph], [FastPG], [MAGIC], [Phate]
February 9, 2021 Single Cell Day 3: Feature Engineering from single-cell data and linking to external variables Citrus, [MELD]
February 11, 2021 Single Cell Day 4: Differential Analysis of Cell-Populations Diffcyt, Cydar
February 16, 2021 Wellness day no class
February 18, 2021 : Homework 1 Due Single Cell Day 5: Graph-based matching of single-cell data Conos, LIGER
February 23, 2021 Single Cell Day 7: Guest lecture by Maria Brbic (Stanford CS) : Semi-Supervised Automated Cell-Population Discovery MARS
February 25, 2021 Single Cell Day 6: Deep Learning for Single Cell Tasks SAUCIE, CellCNN
March 2, 2021 Single Cell Day 7: Trajectory Inference
March 4, 2021 Single Cell Day 8: Benchmarking in Trajectory Inference
March 9, 2021 : Project Proposals Due Presentations of Project Propsals Day 1
March 11, 2021 Wellness Day no class
March 16, 2021 Project Proposal Presentation Day 2
March 18, 2021 Single Cell Day 9: Benchmarking in Single-Cell Analysis Aghaeepour et al
March 23, 2021 Single Cell Day 10: Imaging Proteomics + Spatial Regularization : computational challeneges in combining tissue images and protein expression
March 25, 2021 Multiomics Day 1: Constructing a joint embedding of samples according to multiple modalities, subspace merging SNF, grassmann embed
March 30, 2021 Multiomics Day 2: MOFA-1 and MOFA-2: Multiomics Factor Analysis MOFA-1, MOFA-2
April 2, 2021 : HW 2 Assigned Multiomics Day 3: Uncovering Relationships Between Modalities mmvec
April 6, 2021 Multiomics Day 4: Stacked Generalization and CCA in multiomics studies Ghaemi
April 8, 2021 Multiomics Day 5: Benchmarking in multiomics studies
April 13, 2021 Incorporating Prior Biological Knowledge into Analysis
April 15, 2021 : Homework 2 Due Systems Immunology Topic: TCR/BCR (T and B cell receptor reperotire analysis)
April 20, 2021 Partial Correlation, Thresholding etc for Identifying Meaningful Interactions
April 22, 2021 Enrichment Analysis, writing for an interdiscplinary audience
April 27, 2021 Project Presentations Day 1
April 29, 2021 Project Presentations Day 2
Final Exam Day Project papers due

Homework, Project, Reading, Grading, Etc

Homework

There will be two homework assignments to practice implementing particular concepts. Often, things can become a bit easier to understand and use when they are implemented by you. I will be happy to read/run code written in Python, R, Julia, or Matlab. Please submit your homework writeup as a PDF.

Background Resources

Most of what we discuss in class will come from papers. However, I suggest the following textbooks as background references. Conveniently, they are also available for free.

  • [PRML] Pattern Recognition and Machine Learning-- Chris Bishop [Link]

  • [SLMP] Spectral Learning on Matrices and Tensors -- Majid Janzamin et al. [Link]

  • The Matrix Cookbook [Link]

  • [PML] Probabilistic Machine Learning: An Introduction. -- Kevin Murphy [Link]

Readings

For each class, I will update the papers that we will go over in above table. You will only be required to write a summary of one of the potentially multiple papers assigned for that day.

Reading Questions

Please choose one paper per week on the weeks when reading summaries are due and turn them in before our class meeting 9:30 am to natalies+comp790@cs.unc.edu.

  1. Please explain in 2 sentences or less what the problem being solved is.

  2. What were the main contributions of the authors in this work? (You can answer in a few bullet points).

  3. Please describe 1-2 computational experiments that the authors implemented to test their method.

  4. Were the authors the first to attempt this particular problem? If not, did they compare their results to other baselines? Do you think that their evaluation was objective?

  5. Do you think that the authors provided enough evidence for why their developed method is an important contribution? If yes, please describe their reasoning here. If you do not think they adequately justified why they worked on this particular problem, please describe your thoughts on that here.

  6. What is one follow-up idea or extension from this work?

Final Project

I will provide you with several examples of publicly available biological datasets and problems (https://github.com/stanleyn/Comp790-166-Comp-Bio/blob/main/Datasets.md). Half-way through the semester, you will submit your project proposal and present your idea to the class. The proposal will be a short document describing 1) The problem 2) A background on other people's attempts to solve this problem and 3) A background on your idea of a solution and 4) the data you will use to test your method. At the end of the semester you will write a short paper explaining your method and results and present your results.

Grading

Grading will be based on the following

  1. Reading Questions : 20% over the entire semester
  2. Homework 1: 20%
  3. Homework 2: 20%
  4. Project Proposal : 10%
  5. Project final writeup: 30%

Accessibility Statement

The University of North Carolina at Chapel Hill facilitates the implementation of reasonable accommodations, including resources and services, for students with disabilities, chronic medical conditions, a temporary disability or pregnancy complications resulting in barriers to fully accessing University courses, programs and activities. Accommodations are determined through the Office of Accessibility Resources and Service (ARS) for individuals with documented qualifying disabilities in accordance with applicable state and federal laws. See the ARS Website for contact information: https://ars.unc.edu or email ars@unc.edu.

(source: https://ars.unc.edu/faculty-staff/syllabus-statement)

Diversity Statement

I value the perspectives of individuals from all backgrounds reflecting the diversity of our students. I broadly define diversity to include race, gender identity, national origin, ethnicity, religion, social class, age, sexual orientation, political background, and physical and learning ability. I strive to make this classroom an inclusive space for all students. Please let me know if there is anything I can do to improve, I appreciate suggestions.

About

Computational Biology- Spring 2021


Languages

Language:TeX 100.0%