Shobhit20 / Promoterfinder_prokaryotes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Promoterfinder_prokaryotes

The promoter region is a region of DNA that initiates transcription of a particular gene. Promoters are located near the transcription start sites of genes, on the same strand and upstream on the DNA. The length of promoter can vary from 100-1000 base pairs in eukaryotes. Since we in this case want the promoter region of Campylobacter jejuni which is a bacteria so we gather some useful insights of the promoter region of the gene and based on that try to predict the promoter region of the organism. Following are a few characteristics of the promoters present in prokaryotes -

  1. Promoters in prokaryotic organisms are two short DNA sequences located at the -10 (10bp 5' or upstream) and -35 positions from the transcription start site (TSS).
  2. Their equivalent to the eukaryotic TATA box, the Pribnow box (TATAAT) is located at the -10 position and is essential for transcription initiation.[1]
  3. The -35 position, simply titled the -35 element, typically consists of the sequence TTGACA and this element controls the rate of transcription. Apart from that Campylobacter Jejuni contain three conserved regions, located approximately 10, 16, and 35 bp upstream of the transcriptional start point as suggested in [2]. The -10 region resembles that of a typical E. coli promoter, but the -35 region is completely different. Based on these characteristics of promoters we make an assumption that the maximum length of the promoter cannot go beyond 60 and thus we take the 60 bases upstream to the gene and classify that region as the promoter keeping in mind that the above characteristics are satisfied. The code for the question is uploaded by the file name “ ​promoter_reg.py ​”. The code is written in python 2 and can simply be executed by the following command in terminal “ ​python promoter_reg.py ​”. The output generated by the file is in the following format -

motA-flagellar motor protein MotA ATGGATCTTTCAACCATATTAGGA………………………...TAA 5' flanking region AAAACAAGTTCAAGTATCGCCAAAAATTGGGGCGATTTAAAATAATCAAGGAG ATAATTA 3' flanking region AAATGGCTAA

The first line of a gene is the gene name and description, second line is the complete sequence of the gene. The 5’ flanking region produced is the region upstream of the gene and the code also generates 10 bases downstream of the gene in the 3’ flanking region

[1] ​https://www.addgene.org/mol-bio-reference/promoter-background/ [2] ​Wösten, Marc MSM, et al. "Identification of Campylobacter jejuniPromoter Sequences." ​Journal of bacteriology ​ 180.3 (1998): 594-599.

About


Languages

Language:Python 100.0%