PNNL-CompBio / CLEAN-Contact

PyTorch Implementation of CLEAN-Contact: Contrastive Learning-enabled Enzyme Functional Annotation Prediction with Structural Inference

Home Page:https://www.biorxiv.org/content/10.1101/2024.05.14.594148.abstract

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fasta as input

abbyjerger opened this issue · comments

Currently the input test data must be in csv format. The pre-inference steps only allow for ResNet50 and ESM2 embeddings to be created from csvs. Similarly, although there is an inference_fasta.py script, it only creates the ESM2 embeddings from the fasta, not the ResNet50 embeddings.

It would be nice to use a csv or a fasta file to create the structure and sequence embeddings for the pre-inference step.

The fasta_to_csv function in utils.py is not currently adding the protein sequence to the 'Sequence' column in the output csv.