would fasta comparison fit into the seqmagick framework?
cswarth opened this issue · comments
I periodically need to compare two fasta files which always turns out to be much harder than it should be. The order of sequences in a fasta file doesn't matter, nor does the formatted line length, so that rules out using 'cmp' and 'diff'.
If only the identifiers matter, one can do something like this,
cmp <(grep '>' file1.fa | sort) <(grep '>' file2.fa | sort)
I don't know of an easy way to also compare the sequences without getting a ton of spurious differences. People often suggest perl or python hackery but I haven't found a tool that does what I want out-of-the-box.
I'm mostly interested in finding out if there are differences and in which sequence they occur in. See in the actual differences is less important for my application.
seqmagick
seems like it might be a good place to put such a comparison tool. What do others think?
I'm happy to implement it if this makes sense.