markovmodel / adaptivemd

A python framework to run adaptive Markov state model (MSM) simulation on HPC resources

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tutorial part 1

franknoe opened this issue · comments

These are comments / requests for tutorial part I:
https://github.com/markovmodel/adaptivemd/blob/master/examples/tutorial/1_example_setup_project.ipynb

Overall I really like the API. Here are a few minor issues and comments:

General: The hardest part in learning an API (also this one) is to learn the terminology. To make this easier, it's important to always stick to the same terminology (mostly the case but not always), and to explain the terminology first. I suggest creating an overview slide like the one you sent me for the presentation (also for the docs) that defines all terminology used in the framework, and then stick to this terminology in the code naming conventions, such as task, engine, storage area, etc. You can link to that doc page from the tutorial notebooks.

from adaptivemd import LocalCluster, AllegroCluster
For a public package, it seems strange to have a class specific to the Machine of one group. How about the following: We encode the configuration of the resource in a config-file, and load that, e.g.
``resource = LocalCluster(‘BUILTIN/allegro.config’)
(Change the naming convention if you find something better). allegro.config is in this case part of the package, but you could specify another file name here. That would also be very useful in case if we have to change the configuation because we have a new SLURM version or something else changes on the cluster. This is general, and we don’t have to publicize the existence of the allegro.config file outside our group.

Cell 6: Does LocalCluster mean you are just running on the local machine where this notebook runs? What about LocalHost? For me local cluster is something like a group cluster.

resource.wrapper I get the idea, but it isn’t intuitive to me that to call this wrapper. Maybe you can explain the idea a bit more or find a different naming. Maybe task_wrapper? Is the idea of this object to wrap the task?

Cell 16:

  • pdb_file = File('file://../files/alanine/alanine.pdb').named('initial_pdb').load()
    should this be ‘name’ or ‘named’ (text before says ‘name’)
  • unit - since the term unit doesn’t seem to be used elsewhere in the framework, how about calling this “worker”?
  • staging - the sentence after that has a few grammar errors.
  • Also I’m not familiar with the term staging. If I understand correctly this is the long-time storage of the project, i.e. where you put all of your trajectory data. What about storage?

Cell 17: This is probably to short notice to change it, but for packages like OpenMM that possess an API it is strange (and introduces unnecessary sources of error) to go through a bash execution - you seem to have an API -> script -> API transition.
In my mind, the Engine should define what and how to run, and a ScriptedEngine could be a special case where the Engine API doesn’t directly map to a call, but to a script generation+execution. Maybe this is how things work already, my comment is just based on what I see in the tutorial.

Cell 19: again name vs named?

Cell 20: not sure why you refer to stores. I guess this is related to storage, but the user probably doesn’t need to know about that.

Section header 3: small typo in 'initial'

Cell 21: The reference to the engine confused me at first, but actually you just take the pdb name from there. I’d change this to pdb_file as everywhere else

Cells 21/22/23: It’s not really clear why the trajectory object is first added to the project, then you create a run-task with this trajectory in the engine, then you queue the task. Seems somehow redundant. What happens if we do not execute cell 21, and why are cells 22/23 not sufficient to establish all required information?