Projects From Cal Poly Fall 2010 CSC 570 Bioinformatics * AGR/ - LaTeX Lecture notes on Advanced Genome Rearrangement * pyroprinting/ - A Python program for simulating the Pyroprinting process (see more below) NOTES * Some contents in this repository may have different release licenses. Please carefully read each of the READMEs in the directories. Each README should also be reprinted here. ---------------------------------------------------------------------------------------------- Pyroprinting Sequence Generator README Part 1.1 Last Updated: 12/9/2011 Authors: Allen Dunlea, Ryan Hnarkis, Tyler Yero Running Code: The code takes the following arguments to run [directory in] [directory out] [primer sequence] [length to snip (~97-104 bases)] [generation mode: (K)eyword,(H)istogram, (U)niform, (E)xhaustive] [number of samples to generate] [pattern (Ex. 4,1,1,1)] all of them must be present in order for the code to run and will print an error message if all 7 arguments are not present. There is not much error checking on the arguments currently so make sure you know what you are doing when you run the program or you might have unexpected results. The following is a description of each argument: [directory in] : The directory containing the files that contain opterons to sample with. [directory out] : The directory where the output will be stored (Files will be created like exhaustive.out, keyword.out, etc...) [primer sequence] : Used to parse the files in the in directory. The primer sequence will be searched for and the following characters will be returned. [length to snip] : This is the number of characters that are returned after the primer sequence is found (we usually used around 100) [generation mode] : The mode to generate with. See files or the report for a further description of the different types. [number of samples to generate] : The number of samples that will be generated using random sampling. If (E)xhaustive has been selected as the genertion mode this number will be ignored. [pattern] : This is the pattern that will be used to generate samples. The pattern denotes what ratio each opteron should be put in. For instance, with a pattern 3,4 3 opterons of one type will be put in the sample followed by 4 copies of a different opteron. Note that these numbers need to add up to 7 and no spaces can be used. Examples: SampleOriginalFiles output GGAACCTGCGGTTGGATCAC 104 K 6 3,3,1 All code is released under the MIT license (unless otherwise specified): Copyright (c) 2011 SuperGene Predictor Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ---------------------------------------------------------------------------------------------- Pyroprinting Simulator README Contributors: - Matt Carson - Dirk Cummings - Connor Lange Project Programming Language(s): - Python Project Structure: The project is contained within the source file insilico.py. It is responsible for everything related to completing the task. Aside from PyMySQL, there are no other dependencies. All data structures used are already provided for through standard Python data structures and types. Source File List: - insilico.py * The main script. Parses the opterons, builds the table, and constructs the pyroprint. The resulting pyroprint is written to the file "pyroprint.out" in the current working directory. Format for the output is a CSV file. * A serialized copy of the last builds of the unit averages table is persisted in the 'pyroprintTrends.pkl' file using Python cPickle. If the file is located in the current directory, it is loaded before construction of the table starts, and its data combined with future results. - optedredger.py * A script to parse multiple .seq files for proper opteron sequences and output the sequences in the given ratios. Used for building .opts files for testing with known data. * Currently works only on 23-5 sequences. - *.opts * Various Combinations of Opterons taken from ./Genome Sequences/plasemid ratio tests/12-15-10 plasmid ratios-ModGATC-2controls.xls and ./Genome Sequences/rDNA plasmid sequences/23-5/ used for testing the output of insilico.py. Each file is an input into the program to be run "insilico" and generate a simulated pyroprint. Executables: - insilico.py <opteron.opt> * The current implementation takes only one parameter at run time and must be a file of opterons in the format: # Comment <opteron string> # Comment <opteron string> ... * Developed for Python Version 2.7 * Requires PyMySQL Version 0.5. Available at https://github.com/petehunt/PyMySQL - optedredger.py -r <X:Y> -s <sequence 1 name>:<sequence 2 name> [-d <sequence file directory>] [-o <outfile>] * Where -r : The ratio of sequence 1 to sequence 2 in the final result -s : A \"ratio\" of sequence names to be used. Example: Dg03-5:Hu01-3 NOTE: '23-5 ' and '.seq' are automatically added -d : An optional parameter to change the default location to search for the sequence files. The default location is: ./Genome Sequences/rDNA plasmid sequences/23-5 -o : Optional parameter to change the default output file. The default output file name is "<seq 1 name>-<seq 2 name>.opts" * The output format is the same format for the .opts files Known Bugs / Issues: - The "insilico" pyroprint created by this program is not in the XML format used / provided by the pyro printer. Release / Licensing: Copyright (c) 2011, Matt Carson, Dirk Cummings, Connor Lange All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * The names of the contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of the copyright holders.