Candidate genes and prevalence

Question

Candidate genes and prevalence

nicola-palmieri opened this issue 2 years ago · comments

Dear author

I am Dr Nicola Palmieri from the Vetmeduni Vienna (Austria). I performed a GWAS using 2024 E. coli avian strains by fitting the phenotype "pathogenic in chicken" encoded as 1 (pathogenic) and 0 (non-pathogenic) and using gene presence/absence as genotype input.

I have got a list of 65 candidate genes, however, when I compute the prevalence of each candidate gene in each phenotype, I also get genes with high prevalence in non-pathogenic strains vs pathogenic. So my question is: Why do I get genes with high prevalence in non-pathogenic strains and low prevalence in pathogenic strains? Is treeWAS detecting candidates in both directions? If yes, can I simply filter the candidates selecting only the ones that have at least 50% prevalence in pathogenic strains and a ratio of pathogenic/non-pathogenic > 1 or 1.something?

Thank you!
Nicola