maialab / agvgd

An R Implementation of the Align-GVGD (A-GVGD) Method

Home Page:https://maialab.org/agvgd/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pre-computed alignments?

sigven opened this issue · comments

Hi,

Thanks for a neat tool! I see you have attached a few existing protein sequence alignments with your package. Are you aware of any resource/database in which you can fetch such data (i.e. pre-computed alignments) for any human gene? Or do you need to compute this for any gene of interest using existing multiple sequence alignment tools?

kind regards,
Sigve

Hi Sigve,

Thanks for reaching out!

Just like you said, we do have a few alignments bundled with {agvgd}. These alignments are the ones also provided in agvgd's original website.

As you might know, a multiple sequence alignment is something of an art. :) I don't know of any resource that offers such pre-computed alignments for any given human gene. In Ensembl you can download an alignment of orthologues which is computed on-the-fly for you. But it is not something you can do in batch-mode, I reckon.

For example, for TP53, we may go here: https://www.ensembl.org/Homo_sapiens/Gene/Compara_Ortholog?db=core;g=ENSG00000141510;r=17:7661779-7687538, then hit Download Orthologues. Then choose Alignment -- amino acids.

snap

There is also something similar, which is the pairwise alignments between an orthologous sequences and a human sequence. You can retrieve them using Ensembl's REST API: https://rest.ensembl.org/documentation/info/homology_symbol. In the experimental package {protean} we are using this method to create what we call sequence profiles. Sequence profiles are built out of those pairwise alignments, but we remove gaps from the human sequence (used as a reference), and create a pseudo-alignment of those orthologous sequences. Like I said, the package is experimental, but you are welcome to try it, if you find it useful. The function to get those profiles is: get_profile().

But, in general, you need indeed to compute your own alignment, and probably curate it manually. The original authors of agvgd did this for a few genes (the ones we include with {agvgd}), but I think it is not easy to perform such task automatically and genome-wide, at least, not without questioning the quality of the alignments.

cheers, RM

Thanks a lot, Ramiro, for your swift response! This is indeed very helpful and useful information. I was actually suspecting that making a multiple sequence alignment is not straight-forward, considering which species to include etc. I'll pursue this further, and see if I am able to make something work out :-)

cheers,
Sigve