[under development]
- Identify differentially expressed genes. (e.g. RNAseq)
- Aquire genomic sequence of host organism and sequences of promoters.
- Specify parameters in config.txt. An example is given in the "Input files" section.
- Determine most frequent kmers in the promoters of the differentially expressed genes.
- Determine baseline frequency of those kmers in the entire genome.
- Cluster kmers that are found significantly more frequent in the used promoters than in the genome.
The script offers two
python script.py -m PrepareGenome -i genome.fasta
python script.py -m AnalysePromoters -i counttable.txt -t counttable
path_to_genome = ... // Full path to the genome of the relevant host organism
path_to_promoters = ... // Full path to the file with the genes promoter sequences
list_of_differentially_expressed_genes = ... // file containing a list of gene identifiers
kmer_length = ... // Length of the kmers that will be searched
top_n_kmers = ... // Amount of kmers that should be analysed