dtemir / rosetta-commons-internship

Summer 2021 Internship at RosettaCommons (Stanford Lab)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Summer 2021 RosettaCommons Internship (Stanford Lab)

Preparation

Prior to the bootcamp, we were assigned homework to set up PyRosetta and learn Python.

In homework #1, we install PyRosetta on our Google Colab Environments. It required some internal tweaking and a separate .whl from the developer. In its continuation, we check if everything works.

We then do homework #2 on Python for-loops, homework #3 on if-else statements, and homework #4 on functions.

Google Colab

To make everything work in Google Colab, I ended up having a PyRosetta directory with the .whl file in my Google Drive home directory MyDrive. Also, I kept the inputs, Media, and Sessions directories in a directory named temp_pyrbc_202103_notebooks.

Bootcamp (June 7 - June 12)

We spent the first week of our internship at the University of North Carolina at Chapel Hill, where we learned PyRosetta and other useful skills that will come handy during the rest of our internships.

Group Photo at UNC Chapel Hill

During the week, we worked on PyRosetta that was set up in Google Colab. Our teacher was Andrew Leaver-Fay (amazing guy, knows everything about Rosetta and the field).

We had around 15 sessions, starting with:

  • pose class basics
  • PyMol visualizer
  • Rosetta score functions like scorefxn()

Getting into more advanced things:

  • Movers to mutate residues
  • Basic folding processes with algorithms like Centroid Folding
  • XML scripts
  • and many other things that you can find in the bootcamp directory.

An example of what a protein fold looks like:

an example of a protein fold with PyMol

I've really enjoyed the bootcamp because I got to learn about a very exciting field of protein design and prediction, and I also got to meet amazing people from all over the US! You can learn more about my experience on my website.

Research (June 13 - August 8)

During the rest of my internship, I worked at Stanford University, Possu Huang's Lab.

My research project is called Building Hydrogen Bonding Networks with Protein Sequence Design Model, for which I'm taking the Machine Learning model developed at the lab a step further. Their ML model, called Protein Sequence Design Model is a part of the algorithm that samples from the predicted distribution of long sequences of amino acids based on a fixed protein backbone for the inverse protein folding problem.

Based on the local environment, the model produces a distribution of possible residue types and rotamer angles (step 1). The algorithm then designs the protein through iterative sampling from the predicted distribution defined by the local chemical environment (step 2). The sampled residue type and rotamer angle values are then optimized using simulated annealing (step 3).

Protein Sequence Design Algorithm Process Description

Presentation (August 9 - August 13)

About

Summer 2021 Internship at RosettaCommons (Stanford Lab)


Languages

Language:Jupyter Notebook 99.6%Language:Python 0.4%