Dirkle / csc570

CSC 570 Bioinformatics Fall 2011

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Projects From Cal Poly Fall 2010 CSC 570 Bioinformatics

   * AGR/ - LaTeX Lecture notes on Advanced Genome Rearrangement
   * pyroprinting/ - A Python program for simulating the Pyroprinting process (see more below)

NOTES
   * Some contents in this repository may have different release licenses. Please carefully
      read each of the READMEs in the directories. Each README should also be reprinted here.


----------------------------------------------------------------------------------------------
Pyroprinting Sequence Generator README

Part 1.1

Last Updated:
12/9/2011

Authors:
Allen Dunlea, 
Ryan Hnarkis, 
Tyler Yero

Running Code:
The code takes the following arguments to run

[directory in] [directory out] [primer sequence] [length to snip (~97-104 bases)]
[generation mode: (K)eyword,(H)istogram, (U)niform, (E)xhaustive] [number of samples to generate]  [pattern (Ex. 4,1,1,1)]

all of them must be present in order for the code to run and will print an error message if all 7 arguments are not present.
There is not much error checking on the arguments currently so make sure you know what you are doing when you run the program
or you might have unexpected results.

The following is a description of each argument:
[directory in] : The directory containing the files that contain opterons to sample with.

[directory out] : The directory where the output will be stored (Files will be created like exhaustive.out, keyword.out, etc...)

[primer sequence] : Used to parse the files in the in directory. The primer sequence will be searched for and the following
   characters will be returned.
[length to snip] : This is the number of characters that are returned after the primer sequence is found (we usually used around 100)

[generation mode] : The mode to generate with. See files or the report for a further description of the different types.

[number of samples to generate] : The number of samples that will be generated using random sampling. If (E)xhaustive has been
   selected as the genertion mode this number will be ignored.

[pattern] : This is the pattern that will be used to generate samples. The pattern denotes what ratio each opteron should be put in.
   For instance, with a pattern 3,4 3 opterons of one type will be put in the sample followed by 4 copies of a different opteron.
   Note that these numbers need to add up to 7 and no spaces can be used.
                                                                                    
Examples:
SampleOriginalFiles output GGAACCTGCGGTTGGATCAC 104 K 6 3,3,1




All code is released under the MIT license (unless otherwise specified):
Copyright (c) 2011 SuperGene Predictor

Permission is hereby granted, free of charge, to any person obtaining a copy 
of this software and associated documentation files (the "Software"), to deal 
in the Software without restriction, including without limitation the rights 
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 
copies of the Software, and to permit persons to whom the Software is 
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in 
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 
SOFTWARE.


----------------------------------------------------------------------------------------------
Pyroprinting Simulator README

Contributors:
   - Matt Carson
   - Dirk Cummings
   - Connor Lange

Project Programming Language(s):
   - Python

Project Structure:
The project is contained within the source file insilico.py. It is responsible for
   everything related to completing the task. Aside from PyMySQL, there are no other
   dependencies. All data structures used are already provided for through standard
   Python data structures and types.

Source File List:
   - insilico.py
      * The main script. Parses the opterons, builds the table, and constructs the pyroprint.
         The resulting pyroprint is written to the file "pyroprint.out" in the current
         working directory. Format for the output is a CSV file.
      * A serialized copy of the last builds of the unit averages table is persisted in the
         'pyroprintTrends.pkl' file using Python cPickle. If the file is located in the current
         directory, it is loaded before construction of the table starts, and its data combined
         with future results. 

   - optedredger.py
      * A script to parse multiple .seq files for proper opteron sequences and output the
         sequences in the given ratios. Used for building .opts files for testing with known
         data. 
      * Currently works only on 23-5 sequences.
         
   - *.opts
      * Various Combinations of Opterons taken from
         ./Genome Sequences/plasemid ratio tests/12-15-10 plasmid ratios-ModGATC-2controls.xls
         and
         ./Genome Sequences/rDNA plasmid sequences/23-5/
         used for testing the output of insilico.py. Each file is an input into
         the program to be run "insilico" and generate a simulated pyroprint.

Executables:
   - insilico.py <opteron.opt>
      * The current implementation takes only one parameter at run time and must
         be a file of opterons in the format:

            # Comment
            <opteron string>
            # Comment
            <opteron string>
            ...

      * Developed for Python Version 2.7
      * Requires PyMySQL Version 0.5. Available at
         https://github.com/petehunt/PyMySQL
         
   - optedredger.py -r <X:Y> -s <sequence 1 name>:<sequence 2 name> [-d <sequence file directory>] [-o <outfile>]
      * Where
            -r : The ratio of sequence 1 to sequence 2 in the final result
            -s : A \"ratio\" of sequence names to be used.
                  Example: Dg03-5:Hu01-3
                  NOTE: '23-5 ' and '.seq' are automatically added
            -d : An optional parameter to change the default location to search 
                  for the sequence files.
                  The default location is: ./Genome Sequences/rDNA plasmid sequences/23-5
            -o : Optional parameter to change the default output file.
                  The default output file name is "<seq 1 name>-<seq 2 name>.opts"
                  
      * The output format is the same format for the .opts files

Known Bugs / Issues:
   - The "insilico" pyroprint created by this program is not in the XML format
      used / provided by the pyro printer.

Release / Licensing:
Copyright (c) 2011, Matt Carson, Dirk Cummings, Connor Lange
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
    * The names of the contributors may not be used to endorse or promote products
      derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The views and conclusions contained in the software and documentation are those of the
authors and should not be interpreted as representing official policies, either expressed
or implied, of the copyright holders.

About

CSC 570 Bioinformatics Fall 2011


Languages

Language:Java 64.3%Language:Python 35.7%