sunnyEV/python

Custom Made Modules

A_hash_file.py - a script that hash the second column using first column as key
B_hash_mRNA_IDs.py - returns a uniq mRNA id hash
C_loadFasta.py - script to load fasta sequences
D_longest_fasta_sequence_header.py - script return headers of longest sequence
E_get_chr_size_gff3.py - script takes a gff3 file and returns max position for each chromosome

smallRNA Clustering Scripts

4a.py - calculating clusters based on genotype (input IGV file)
4a1.r - for plotting results from previous step
4b.py - find clusters regulations and pattern based on size
4c.py - from the clusters, make it to the inter-intra genic analysis
4d.py - calculating regulated sequences in the cluster

FASTA Handlers

5.py - make fasta files
5b.py - take out mapping positions from igv file using fasta file containing sequences
6.py - a script for taking out cDNAs from transcripts file i.e. MG20 file
7.py - a script for any fasta file which looks for a pattern and returns a count and possibility of being random

miRNA Mapping

8a.py - script for replacing 'U' to 'T'

Genome Gap Filling Simulation

9.sh - script for gap filling project
9a.py - for taking out rep element
9b.py - for replacing genome by N
9c.py - for taking out sequence where only one read is mapped
9d.py - take out the rep elements and put these in the genomic region

Gap Filling Real Data

10a.py - take out N-region from the ref_genome
10b.py - remove any additional N-region which might be present in 10Kb flanking region
10c.py -
10q.py - calculating insert_size/distances
10r.py - check if you have all elements
10s.py - make reverse complement multi-fasta
10t.py - make summary output for elements

ShortRan Scripts

12.sh - /plant/2011_week37/
12a.py - /plant/2011_week40/20111004/
12b.py - for taking out sequences with particular pattern
12c.py - make fasta file from profile
12d.py - replace U by T in miRNA database
12e.py - for counting mismatches
12f.py - for miRNA mapping from counting
12g.py - cluster predictions
12h.py - genome region analysis cluster by cluster or position by position
12i.py - for counting regulated sequences in cluster
12j.py - for counting unique size sequences and plotting the distribution from profile files
12k.py - for counting unique size sequences and plotting the distribution from cluster files
12l.py - for making profile from fastq files
12m.py - remove reads which were mapped on repeats
12n.py - for normalizing profiles
12o.py - generate a text file with 0's of size total_no_of_librariestotal_no_of_libraries
12p.py - for plotting expression data between two sets
12r.py - for making sql batch script
12s.py - add anotation to the sequences
12t.py - script for making mySQl-add column batch file
12u.py - make a script which can make a fasta file with abundance in the header as of format Sequence-xAbundance
12v.py - script for making igv files for clustering and visualization
12w.py - script for filtering reads based on score
12x.py - script for making chromosome length file for miRNA predictions
12y.py - script for making file compatible for tasiRNA predictions
12z.py - script for taking out clusters of ta-siRNAs
12aa.py - script for combining sequences for profile with the mapped genomic sequences which contain genomic annotation
12ab.py - script for making header for the table with library wise abundances
12ac.py - script for making table the library wise abundances
12ad.py - script for plotting abundances
12ae.py - python script for reaplcing libraries header
12af.py - miRNA to MySQL database
12ag.py - tasi-RNA to MySQL database
12ah.py - add length coloumn to the mySQL database
12ai.py - make a file with unique abundances
12ai.py -
12aj.py - script for spliting the fastq files by size
12ak.py - make artificial adapters
12al.py - parse the mirdeep 2 output to the mysql supported output
12am.py - script to find the miRNA sequences in profiles - /Users/vgupta/Desktop/script/python
12an.py - script to add an identifier based non-redundancy - /Users/vgupta/Desktop/script/python
mysql_batch - for saving file into the mysql database

gapfillRE

13_20110929_gapfillRE.sh - shell script processing other python scripts data
13_20110929_positive_control.sh - for running posistive control- a bit different as input comes from blast
13a.py - for filtering of reads where both ends map to rep elements
13b.py - for making reference compatible, i.e. adding headers, removing small letters
13c.py - for taking out all gap positions
13d.py - take out genomic sequence with flanking regions
13e.py - remove additional N regions around targeted gap
13f.py - take out hanging reads mapping on the flanking region
13g.py - filter out pairs mapped to the flanking region
13h.py - filter out diretional reads i.e. for 5'&3', 5',3'
13i.py - taking out top four condidate suitable for replacement of gap and make score table
13j.py - reporting for gap regions which have no appropriate rep element for gap
13k.py - pick out best possible element from scores
13l.py - print final list of elements with score
13m.py - count correctly inserted elements(only for positive controls)
13n.py - correct sequence name in fasta file (remove every thing after spaces), problem when mapping
13o.py - for taking out a particular fasta sequecnce
13p.py - script to remove pair mapped
13q.py - take out all the contigs alraeady placed in the psuedomolecule
13r.py - add length of the contigs
13s.py - add distances from 5 prime and 3 prime ends

Bactrial Genome Project With Niels

14.py - for finding a gene in many genomes
14b.py - for finding a gene in many genomes using blast for unannotated genomes
14c.py - script for taking list of genes and concatanating these by species.

Genome-wide Signatures

15.py - script to process the genome wide signature
15a.py - script to add length and relavant columns

Svend's Data

16.py

SpearmanRank

spr.py open file and calculate spearman co-efficient between all columns

Counting Corrected Reads

correct_read.py - count reads that has been corrected by ECHO

Making Patterns '/_' For Regulations

make_patterns.py - making patterns '/_' for regulations

Using R From Python

18_plot_sv.py - for plotting results obtinaed from the breakdancer

Yasu's Data

19_filter_markers.py - for filtering positions with the markers and storing these
19_merge_marker.py - script for merging different files based on some columns
19_remove_marker_positions.py - script for removing the existing markers and keeping only new SNPs

28 Accession Data

Fastq Script Kit

20_compare_fq_mapping.py - script for comparing read-1 and read-2 mapped files to same reference
20_divide_on_adaptors.py - script for deviding fastq file based on different adapters (demultiplexing)
20_trim_reads.py - script for trimming the fastq reads and quality scores
20_compare_fq.py - script for counting common reads in two fastq files
20d_count_mapped_fastq_inSam.py - script for counting common reads in two fastq files - script for counting the reads mapped

Function For Making Filtered Fastq File

17_20120212_filter_fastq.py - /Users/vikas0633/Desktop/plant/2012_week7

Genomic Toolkit

21a_remove_chacters.py - this script removes the any other character than ATGCN 21b_better_header.py - this script keeps only 4th field separated by '|'
21c_add_1_start.py - this script can add +1 to start position in a fasta file
21d_take_out_gene.py - this script takes out a sequence from fasta file given correct header name
21d_take_out_gene_list_headers.py - this script takes out a sequence from fasta file given correct header names in a file
21e_gff2gtf.py - this script converts gff, gff3 format to gtf format
gtf_to_gff.pl - this script converts gtf to gff3 format
81_parse.pl - script for calculating N50 value
21f_merge_two_files.py - script for merging two files based on given columns
21g_para_gtf.py - script for calculating exon/intron/transcripts lengths
gff_convert.pl - script for inter-converting different gff formats
intersection of gene models - bedtools intersect
21h_calculate_seq_len.py - take a fasta file and print sequences in decreasing length
21h_plot_seq_len.py - take a fasta file and plot sequence length
21i_RMoutput2GTF.py - take a tab-formatted RMoutput file as parse it to make a gtf file
21j_orf2fasta.py - script takes fasta file and output from orffinder and take out sequences with the orfs
21k_make_input4_glimmerHMM.py - this scripts takes a gene structure file (gff/gtf) and makes a exon file parsable by glimmerHMM
gff_to_genbank.py - Convert a GFF and associated FASTA file into GenBank format
21l_pileup2GTF.py - script converts a pileup to a gtf file based on the coverage
21m_gff2genestru.py - script creates input for gb format conversion script
21n_overlap_gff.py - takes two or more gff files merge the files where you see an overlap
21n_intersect_gff.py - takes two or more gff files merge the files where you see an intersection
21o_extract_seq_model.py - script takes out sequences/GTF models from given co-ordinate
21p_filter_fasta.py - script to filter fasta file based on the length of the sequences
21q_combine_GTF.py - This is the script for combining various annotations files
21r_make_CDS.py- script to create CDS file from fasta (containing exon sequences generated by bedtools) and GTF/GFF3 file
21s_summary_eval.py - script for summarizing eval output
21t_tau.py - script to add ORF to the gff file
21u_make_gff2.py - script makes gff2 file for the TAU input, same as Stig's 26_parse.pl
21v_format_gff3.py- script to format gff3 file in order to put in MySQL table
21v2_format_gff3.py- script to format gff3 file in order to put in MySQL table sequal to 21v
21w1_format_fasta.py- fasta file has duplicate entries
21w1_format_orthoMCL.py - format OrthoMCL output
21x_exon_repeat.py- find the exon Repeat over lap
21y_strand_fasta.py - script takes a GFF3 file and correct fasta file if minus strand
21z_foramt_IPR.py - script takes raw output from IPRScan and make non-redundant gene_ID\annotation
21aa_countMShit_in_GFF.py - script to count the uniq MS supported genes
21ab_split_gff.py - script to split sorted GFF file based on contig/sequence/chro name
21ac_addType.py - script to add gene type
21ad_makebed.py - script to make bed format file from the given column names
21ae_correct_UTR.py - script to correct the UTR co-ordinates
21af_format_protein_list_headers.py - script to get the corresponding headers between corrected and real fasta file
21ag_cal_CSD_gene_overlap.py - script to calculate the CDS vs gene overlap
21ah_find_longest_isoform.py - script was made for finding longest isoform in the spider protein set
21ah_count_N_between_genes.py - script to count Ns between the genes
21ai_modify_gene_names.py - script to modify gene names based on N counts
21aj_add_mRNA.py - script to add dummy mRNAs if absent
21ak_remove_redundant.py - script to remove the redundant node gene models
21al_correct_strand.py - this script takes strand from CDS and assigns the same to mRNA, exons and UTRs, GFF3 files
21am_update_GFF3_fasta.py - this script updates GFF3 and fasta given a different file
21an_hash_MySQLid.py - this script makes a 2 column table one with Id and another with yes/no
21ak_update_GFF3_IDsOnly.py this script take a two column id and replaces these in the GFF3 file
21ao_keep_fasta_ifGFF3.py - script to throw out excessive sequences in fasta file
21ap_TranscriptSummary.py - Summerizes GFF3 transcript wise
21aq_addGeneStrand.py - Adds the strand to the gene based on the mRNAs strands
21ar_findLongestIsoform_GFF3.py - Find the longest isoform for each gene in a gff3 file
21as_calc5primeCdnaDistance.py - calculate 5' distances of insersions
21at_FindLongestProtein.py - Finds Longest Protein
21au_trim3primeCDS.py - trims the 3 prime ends of CDS
21aw_CallFractionexon.py - calculate the callable fraction on the genome
21ax_LongestProteinCodingIsoform.py - Find the longest protein coding isoform
21ay_countFixDifference.py - Script for counting the Fix Differences in Population genetics
21az_addNRanno.py - Script for adding blast annotations from NR database
21b_better_header.py - Script to fix the fasta headers
21ba_getGeneBasedAlign.py - Script to calculate the gene alignment length from the MAF output
21bb_getGeneBasedAlignLength.py - Script to calculate the gene alignment length from the MAF output
21bc_GenotypicDistance.py - Script to calculate the genotypic distance from the VCF format file
21bd_summerizeArrayData.py - Script to summerize the array data
21be_bMakeHeatMap.py - Script to make the heatmap
21bf_ortho2fasta.py - Script to transfer the ortholog groups to fasta files
21bg_find_fragmented_genemodels.py - Script to find fragmented genemodels
21bi_search_blast.py - Script to blast a list of genes against a database and back

Transcripts Handlers

22a.py - script for parsing tophat/cufflink generated GTF files against a target (-G cufflink) annotation file
22b.py - normalize transcript profile table
22c.py - script for making plots from profile tables generated using 22b
22d.py - add profile tables to MySQL
22e.py - add annotations to profiles using fasta files
22f.py - add annotations to profiles using two column formatted file
22g.py - script to get pattern frequency from a profile table given a regulation, abundance and score cut-off
22h.py - script for finding complementary pattern between small RNAs and transcripts

MYSQL

23a_mysql_header.py - script for making headers for mysql tables

Blast

24a_filter_blast.py - script for filtering blast results

Python Plots

25a_plot_gene_freq.py - script for plotting gene frequencies across each chromosome

MirDeepP Summary

26_summary_mirDeepP.py - script for taking all the outputs from mirDeepP and putting it together

Spider Project

27_summary_MS_hit.py - script for process MS hit text file
27_foramt_fasta_spider.py - script to format the fasta headers according to the Thomas's explanations
27_TranscriptsOnScaffold.py - Script to extract all the transcripts on given scaffolds

UNC RNA-seq project

28a_obo_parser.py - script to obo file from the geneontology.org
28b_MSU_RAP_ids.py - MSU id parser
28c_gff3_validator.py - Script to validate a gff3 file

29. snpEff data analysis

29a_MakeGeneWideTable.py - script to put the snpEff data togehter
29b_MakeGeneWideTableUnique.py - script to summarize snpEff data

30. Degradome data analysis

30a_count_5prime_stacks.py - script for counting 5' degradome mappings from BAM file

GABox specific

31a_reformat_gff3.py - script to replace the ref column of gff3 by priority
31b_combine_GTF.py - This is the script for combining various annotations files
31c_TAU.py - script to add ORF to the gff3 file
31d_modify_gene_names.py - script to modify gene names based on N counts
31e_ReplaceWithLongerCodingRegion.py - Script to find the longest protein coding evidence with overlapping exons
31e_2_ReplaceWithLongerCodingRegion.py - Script to find the longest protein coding evidence with overlapping exons
31f_get_CuffBasedGenemodels - script to extract cufflinks based genemodels
31g_MakeGeneModelTable.py - same as 21v2_format_gff3.py
31h_add_FeatureType.py - script to modify GFF3 second column
31i_FixBoundries.py - script to modify GFF3 feature boundries

General Scripts

100_intersect_columns.py - script to find non-overlapping entries between the two columns
21ab_split_gff.py - script to split sorted GFF file based on contig/sequence/chro name
101_filter_fastq_len.py - script to filter a fastq file based on read length
102_flat2fasta_anno.py - script to make fasta file from the MySQL output
103_sort_gff_blocks.py - script to sort GFF3 file blocks
104_intersect_files_column.py - script to print the desired columns given keys from the files
105_match_IDs_from_2gff3_files.py - script will take two gff3 files and print out the corresponding mRNA IDs
106_filter_out_against_genelist.py - this script will filter out the genes which are in the list
107_ParameterGFF3.py - Script to calculate Gene, mRNA, exon, CDS count, Total length and average length
108_filterExactOverlapGFF3.py - script to filter overlapping start/end genemodels
109_AddPhaseGFF3.py - script to add Phase
110_getGene.py - script take a list of genes and extracts the genemodels from the GFF3 file
111_blastoutput_parser.py - script to parse blast output and return a table
112_iprscanout_parser.py - script to parse blast output and return a table
113_validate_GFF3.py - GFF3 validation script
114_validate_Fasta.py - Fasta validation script
115_MapFastq.py - Script to Map Fastq files
116_runGATK.py - script to run GATK analysis
117_addReadGroup.py - script to add readgroup in sam or bam file
118_gaps2bed.py - script takes a fasta file and created bed file with gap co-ordinates
129_splitIPR.py - script to split IPR file
130_shuffle_header.py - script to shuffle header of a given file
131_replace_values.py - script to replace all the values in a row
132_translateDNA.py - script to convert DNA to protein
133_snp_genomic_annotation.py - script to calculate the snps genomic distribution

sunnyEV / python