`create_motif` makes incorrect motif for amino acid sequences

What happens

Given a list of sequences (as AAStringSet), create_motif returns an obviously wrong PWM and consensus. It appears that there are some issues with inconsistent ordering of amino acid label values.

What I suspect might be causing the problem

create_motif creates a matrix with row labels from Biostring's AA_STANDARD. This is a list of single-letter amino acid codes that are in the order:
"A" "R" "N" "D" "C" "Q" "E" "G" "H" "I" "L" "K" "M" "F" "P" "S" "T" "W" "Y" "V"

Later manipulations of this matrix seem to expect the order to be alphabetical.

How to reproduce



Compare the output of create_motif with that of consensusMatrix:

create_motif(sequences, alphabet="AA", type="PWM")

Note that I'm using v1.4.0 of universalmotif, but I don't think this issue has been addressed subsequently.


Thank you for the very comprehensive report! I believe you are quite right as to what was going wrong. Many functions do indeed expect everything to be sorted alphabetically.

I applied a quick fix which at least makes your example work now. It'll be applied either tomorrow or the day after in the bioconductor release version, should you still be interested.