Candidate genes and prevalence
nicola-palmieri opened this issue · comments
Dear author
I am Dr Nicola Palmieri from the Vetmeduni Vienna (Austria). I performed a GWAS using 2024 E. coli avian strains by fitting the phenotype "pathogenic in chicken" encoded as 1 (pathogenic) and 0 (non-pathogenic) and using gene presence/absence as genotype input.
I have got a list of 65 candidate genes, however, when I compute the prevalence of each candidate gene in each phenotype, I also get genes with high prevalence in non-pathogenic strains vs pathogenic. So my question is: Why do I get genes with high prevalence in non-pathogenic strains and low prevalence in pathogenic strains? Is treeWAS detecting candidates in both directions? If yes, can I simply filter the candidates selecting only the ones that have at least 50% prevalence in pathogenic strains and a ratio of pathogenic/non-pathogenic > 1 or 1.something?
Thank you!
Nicola