Jumbled Characters in Dataset
t03i opened this issue · comments
t03i commented
In the train 74k.fasta
the sequence 9pcyA00
contains 0 bytes.
Michael Heinzinger commented
Thanks for reporting. I did not encounter this error when reading in the file with Python, however, I also saw the single malformatted character in the above reported sequence when opening the file in the browser. Therefor, I decided to remove this sequence from the training data to avoid further complications.