This repository contains data generated & used in the PNAS paper titled "Improved global protein homolog detection with major gains in function identification"
- Figure 2:
BenchmarkMax50.ipynb
- SI Figure 1:
AUCHeatmap.ipynb
- SI Figure 2:
Speed.ipynb
- SI Figure 3:
SizePlot.ipynb
- SI Figure 4:
BenchmarkNoMax50.ipynb
- Speed Benchmarking:
speedtest/
- ESM-1b mean of last layer results:
ESM1bL34M/
- PROST-L benchmarking results:
prost-l/
- Benchmarking pairs from G. V. Saripella, E. L. L. Sonnhammer, K. Forslund, Benchmarking the next generation of homology inference tools. Bioinformatics 32, 2636–2641 (2016):
bechmark/Methods_benchmarking_pairs
- Benchmarking results from G. V. Saripella, E. L. L. Sonnhammer, K. Forslund, Benchmarking the next generation of homology inference tools. Bioinformatics 32, 2636–2641 (2016):
bechmark/Bitscores_and_Evalues
- max50 sequences:
benchmark/max50.fa
- noMax50 sequences:
benchmark/nomax50.fa
- Yeast analysis: https://mesihk.github.io/prostyeast
- Unannotated Human Proteins Analysis: https://mesihk.github.io/prosthuman
- Webserver: https://mesihk.github.io/prost
- PROST Python package: https://github.com/MesihK/prost
- PROST Research Data: https://github.com/MesihK/prost-data