qseqid problem
CWYuan08 opened this issue · comments
Hi @arendsee,
I have managed to run the same analysis as the tutorial for my data, but I am wondering why the qseqid for all ps is ensembl_peptide_ID, except for ps 2, which has different ID names that I don't know how to convert in biomart.
Could you please help me on this? Thank you very much!
Best,
CW
@CWYuan08 I have a rough idea of what might be wrong, but it could take awhile to fix. As far as the phylostratigraphic results go, the protein ids shouldn't matter much (I never needed to deal with them). However, the inconsistency you found in the labels is a bug, and I'll look into it.
I'm under a bit of pressure at the moment, but I'll get to this problem maybe in a few days. If I don't respond, you feel free to post a reply to remind me.
Dear @arendsee,
Have you had any chance to check this bug? I would greatly appreciate your help:)
Thank you
Best
CW
Dear @arendsee,
Sorry for bothering you again, have you had any chance to check this bug? I would greatly appreciate your help:)
Thank you
Best
CW
@CWYuan08 I don't think I can fix this. phylostratr
uses the IDs that uniprot provides and mapping to other ID systems isn't supported. Adding handling for such conversions may be possible, but I don't have the time to implement it. I would be happy to merge a pull request from anyone who finds a solution, though.
Dear @arendsee,
I see, I am not sure how this can be fixed. So this is a problem of uniprot files for Eukaryota only? I am wondering why only ps2 is affected
Thank you
CW
The ps2 entries do have useful IDs in them. You just need to extract the patterns after the "sp|", these are Swissprot IDs, I believe. For example, Q1LU93. Biomart should be able to handle these.
Thank you! I have found some explanation for the different parts, I will give it a try.
Best
CW
OK, good luck!