Rostlab / EAT

Embedding-based annotation transfer (EAT) uses Euclidean distance between vector representations (embeddings) of proteins to transfer annotations from a set of labeled lookup protein embeddings to query protein embedding.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Jumbled Characters in Dataset

t03i opened this issue · comments

commented

In the train 74k.fasta the sequence 9pcyA00 contains 0 bytes.

Thanks for reporting. I did not encounter this error when reading in the file with Python, however, I also saw the single malformatted character in the above reported sequence when opening the file in the browser. Therefor, I decided to remove this sequence from the training data to avoid further complications.