Summer 2021 RosettaCommons Internship (Stanford Lab)

Preparation

Prior to the bootcamp, we were assigned homework to set up PyRosetta and learn Python.

In homework #1, we install PyRosetta on our Google Colab Environments. It required some internal tweaking and a separate .whl from the developer. In its continuation, we check if everything works.

We then do homework #2 on Python for-loops, homework #3 on if-else statements, and homework #4 on functions.

Google Colab

To make everything work in Google Colab, I ended up having a PyRosetta directory with the .whl file in my Google Drive home directory MyDrive. Also, I kept the inputs, Media, and Sessions directories in a directory named temp_pyrbc_202103_notebooks.

Bootcamp (June 7 - June 12)

We spent the first week of our internship at the University of North Carolina at Chapel Hill, where we learned PyRosetta and other useful skills that will come handy during the rest of our internships.

During the week, we worked on PyRosetta that was set up in Google Colab. Our teacher was Andrew Leaver-Fay (amazing guy, knows everything about Rosetta and the field).

We had around 15 sessions, starting with:

pose class basics
PyMol visualizer
Rosetta score functions like scorefxn()

Getting into more advanced things:

Movers to mutate residues
Basic folding processes with algorithms like Centroid Folding
XML scripts
and many other things that you can find in the bootcamp directory.

An example of what a protein fold looks like:

I've really enjoyed the bootcamp because I got to learn about a very exciting field of protein design and prediction, and I also got to meet amazing people from all over the US! You can learn more about my experience on my website.

Research (June 13 - August 8)

During the rest of my internship, I worked at Stanford University, Possu Huang's Lab.

My research project is called Building Hydrogen Bonding Networks with Protein Sequence Design Model, for which I'm taking the Machine Learning model developed at the lab a step further. Their ML model, called Protein Sequence Design Model is a part of the algorithm that samples from the predicted distribution of long sequences of amino acids based on a fixed protein backbone for the inverse protein folding problem.

Based on the local environment, the model produces a distribution of possible residue types and rotamer angles (step 1). The algorithm then designs the protein through iterative sampling from the predicted distribution defined by the local chemical environment (step 2). The sampled residue type and rotamer angle values are then optimized using simulated annealing (step 3).

dtemir / rosetta-commons-internship

Summer 2021 RosettaCommons Internship (Stanford Lab)

Preparation

Bootcamp (June 7 - June 12)

Research (June 13 - August 8)

Presentation (August 9 - August 13)

About

Languages