human-pangenomics / HPP_Year1_Assemblies

Assemblies from HPP Year 1 production

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing annotations

mozack opened this issue · comments

Hi,

Thank you for this fantastic resource!

The CAT genes index does not appear to have annotation entries for 3 samples:
HG002
HG005
NA19240

https://github.com/human-pangenomics/HPP_Year1_Assemblies/blob/main/annotation_index/Year1_assemblies_v2_genbank_CAT_genes.index

Are the gene annotations for these 3 samples available elsewhere?

Thanks!

The CAT pipeline was dependent on the Minigraph-Cactus graph, resulting in its applicability to only 44 samples (HG002, HG005, NA19240 were set aside to facilitate their use in benchmarking). Conversely, the Ensembl pipeline should include gene annotations for all 47 samples. The link to access the Ensembl gene annotations is: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=submissions/8E6C4ACC-FEA9-4DD8-94A3-B92234206F95--Y1_ENSEMBL_V1/

@mhaukness-ucsc, could you please check if the above link is the version used in the HPRC marker paper?

@juklucas, in your opinion, should we consider providing an index file for the Ensembl gene annotations as well?

Thanks so much! I see the Ensembl annotations and will try them out.

The above link should be correct for CAT for comparisons to marker paper results; however Ensembl should be used for new analysis.