harveywi / Ayla-Visual-Analytics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ayla Visual Analytics

http://www.cse.ohio-state.edu/~harveywi/ayla

Introduction

Ayla is a free, open source visualization tool for researchers in biochemistry, molecular dynamics, and protein folding. It is cross-platform, running on Windows, Mac OS X, and Linux systems.

Ayla features a rich visual analytics environment in which many researchers can work together. A storyboard metaphor is used to organize interesting events in the data.

Binary Files

The Ayla website hosts the latest binaries (currently version 0.1) as well as a screencast demonstrating the functionality of the software.

Source Code

See the Ayla Github repository for source code and additional documentation.

Prerequisites

Install Java 6+

The software is written in Scala and Oracle Java Version 6+ must be installed to run the binaries. Processing large datasets (on the order of hundreds of thousands of protein conformations) can be memory-intensive; hence we recommend running the software on a machine with 8 GB of RAM or more. (As the software matures, it will become faster and leaner.)

Install Java 3D

The software uses Java 3D to render topological landscapes. Before you can use the software you will need to download and install Java 3D 1.5.1.

Firewall Configuration

Ayla uses port 9010 to communicate. Thus, you must configure your firewall to allow communication over this port.

Creating an Ayla Dataset

Before getting started with Ayla, you will need a collection of protein conformations in the form of Protein Data Bank files. The following steps will show you how to transform the collection of files into a format that you can work with using Ayla.

Directory Structure

In order to get started, you need to create a new directory where you will keep all of your Ayla datasets (call it datasets_root_dir). Within datasets_root_dir, create a directory where your first Ayla dataset will reside; let’s call it my_first_dataset. Within my_first_dataset you must create two additional directories which are needed by Ayla that must be called collab_projects and scalar_functions. Thus, you should have created a directory structure which looks like this:


datasets_root_dir
  my_first_dataset
    collab_projects*
    scalar_functions*

Items marked with asterisks must be named exactly as above. Now, you need to put some additional data into the my_first_dataset directory to tell Ayla about your PDB files. Here are your choices:

PDB Files in a Zip Archive

If your PDB conformations are inside of a single zip archive, then create a link/shortcut inside of the my_first_dataset directory called conformations.zip which links to the archive. You must also create a text file in my_first_dataset called conformation_zip_paths.txt in which each line contains the path of one of the PDB files with respect to the zip file’s internal directory. Thus, your directory structure should look like this:


datasets_root_dir
  my_first_dataset
    collab_projects*
    scalar_functions*
    conformations.zip*
    conformation_zip_paths.txt*

where values marked with an asterisk must be named exactly as shown.

PDB Files in one or more directories

If your PDB conformations are stored in one or more directories, then all you need to do is create a text file in my_first_dataset called conformation_filenames.txt in which each line contains the absolute path to a PDB file. Thus, your directory structure should look like this:


datasets_root_dir
  my_first_dataset
    collab_projects*
    scalar_functions*
    conformation_filenames.txt*

where values marked with an asterisk must be named exactly as shown.

Scalar Function Files

In order to use Ayla, you must have one or more scalar function defined over the conformations (i.e. a scalar function assigns a real number to each conformation). Some simple examples of scalar functions include potential energy, contact density, compactness, and radius of gyration. Concretely, this means that if you have 20,000 PDB files, then you must have a text file containing 20,000 lines where each line is a floating-point number. The correspondence between numbers and PDB files is established by the conformation_filenames.txt or conformation_zip_paths.txt files.

Assuming that you have stored the function values in a text file called my_function.txt, copy it into the scalar_functions directory. If you have additional scalar function text files, you can copy those into the scalar_functions directory as well.

Dataset Preprocessing

The pointCloudMaker Utility

The first preprocessing step is to use the pointCloudMaker utility to convert your PDB files into a cloud of points in a high-dimensional Euclidean space. The idea is that, if two conformations are structurally similar, then they will map to points which are closer, and vice versa. Usage of the pointCloudMaker utility is as follows:

pointCloudMaker.sh my_first_dataset

This program will generate a file in my_first_dataset called pcd_v3.dat which contains the point representations of your conformations.

The domainApproximator Utility

The second step in preprocessing is to (optionally) filter out conformations which are uninteresting (or equivalently, select a subset of conformations which are particularly interesting) and approximate their putative conformational manifold structure by assembling a proximity graph. The domainApproximator utility performs these tasks. Usage is as follows:


Usage:  domainApproximator [-w whitelist | -b blacklist] k dataset_dir outputFunctionName
The -w option will keep the conformations in the given whitelist text file and ignore all others.
The -b option will discard the conformations in the given blacklist text file and ignore all others.
k is the number of nearest neighbors that are used when reconstructing the domain.
dataset_dir is the directory of the Ayla dataset.

The -w and -b options allow you to specify a whitelist (or blacklist) text file in which each line is the absolute (zip) path to a whitelisted (or blacklisted) conformation.

The -k parameter should be set to something small. Usually a value somewhere between 10 and 20 is sufficient.

Running the Ayla Server

Assuming that you have generated one or more datasets using the above instructions, you should be ready to (1) launch the Ayla server, then (2) launch one or more Ayla clients.

To run the Ayla server, use the aylaServer command. Usage is as follows:

aylaServer datasets_root_dir

Running the Ayla Client

To start the Ayla client, use the aylaClient.sh command.

Basic usage of the Ayla client is covered in the screencast which can be seen at the Ayla Visual Analytics website.

Troubleshooting

Connection Issues

If you have started the Ayla server but are unable to connect to it using the Ayla client, make sure that your firewall allows communication over port 9010.

Java Issues

If you are trying to run the software using OpenJDK and having problems, try switching to Oracle Java SE 6+. I recall having some weird scala serialization problems on OpenJDK, but switching to Oracle’s VM seemed to do the trick.

Other Issues

For other issues please contact William Harvey (harveywi at cse dot ohio-state dot edu).

About

License:Other


Languages

Language:Scala 89.5%Language:Java 10.2%Language:Shell 0.3%