Problems between matching between the "probe_strand" annotation in EPICv2 and previous EPICv1 manifest
rauldiul opened this issue · comments
Hi,
thanks for your invaluable software and resources. I'm handling some EPICv2 data and I accessed your annotations to genes located here: https://github.com/zhou-lab/InfiniumAnnotationV1/raw/main/Anno/EPICv2/EPICv2.hg38.manifest.gencode.v41.tsv.gz
This matrix has a probe_strand
variable. I wanted to compare this strand annotations with the probes previously covered by EPICv1. It seems that the labels are "opposite": most of the EPICv1 +
probes are labelled as -
in EPICv2, and viceversa. See this example code:
library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
annotationEPICv1 <- as.data.frame(getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b4.hg19))
annotationEPICv2 <- fread(file.path(dir_methfilters,"epicv2/sesame/EPICv2.hg38.manifest.gencode.v41.tsv.gz"))
annotationEPICv2$ID2 <- str_split_fixed(annotationEPICv2$probeID,"_",2)[,1]
intersecting_probes <- intersect(annotationEPICv1$Name,annotationEPICv2$ID2)
table(annotationEPICv1[inters,]$strand != annotationEPIC$probe_strand[match(inters,annotationEPIC$ID2)])
FALSE TRUE
2236 719509
table(annotationEPICv1[inters,]$strand, annotationEPIC$probe_strand[match(inters,annotationEPIC$ID2)])
- +
- 411 358863
+ 360646 1825
Am I missing something? Do you know what could be the issue here?
Also, what is the best way to know the strand for the EPICv2 probes? I was getting it from EPICv2.hg38.manifest.gencode.v41.tsv.gz
because I did not see the annotation in the manifest table EPICv2.hg38.manifest.tsv.gz
thanks a lot for the help
Raúl