atifrahman / HAWK

Hitting associations with k-mers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

smartpca "bad anova popsizes too small"

petercombs opened this issue · comments

Okay, moving farther along in the pipeline, I'm having trouble getting smartpca to work properly (which I think is the first indication that I'm falling off the rails). Specifically, I get:

 $ $hawkDir/smartpca -p $hawkDir/parfile.txt
 ... snip ....
 ## To get Tracy-Widom statistics: recompile smartpca with TWTAB correctly specified in Makefile, or
 just run twstats (see README file in POPGEN directory)
 kurtosis           snps    indivs
 eigenvector    1     1.065     1.000
 eigenvector    2      -nan      -nan
 population:   0                 Case    2
 population:   1              Control    0 ***
 ... more data ...
 eigenvector 1:means
            Case      0.000
         Control      0.000
 *** warning: bad anova popsizes too small
 eigenvector 2:means
            Case      0.000
         Control      0.000
 *** warning: bad anova popsizes too small
 ##end of smartpca run

Happy to figure out how to get copies of my gwas_eigenstratX* files to you, but this is what they look like (in case the format of any of them seems way off):

 $ head gwas_eigenstratX*
 ==> gwas_eigenstratX.geno <==
 1       0
 0       1
 1       0
 1       0
 0       1
 0       1
 1       0
 0       1
 1       0
 1       0
 
 ==> gwas_eigenstratX.ind <==
 TSI_0   U       Case
 TSI_1   U       Case
 TSI_2   U       Case
 TSI_3   U       Case
 TSI_4   U       Case
 TSI_5   U       Case
 TSI_6   U       Case
 TSI_7   U       Case
 TSI_8   U       Case
 TSI_9   U       Case
 
 ==> gwas_eigenstratX.snp <==
 AAAAAAAAAAAAAAAAAAAAAAAGAGACCCC 1       0.000000        0
 AAAAAAAAAAAAAAAAAAAAAAAACAATCAG 1       0.000000        0
 AAAAAAAAAAAAAAAAAAAAAAAGATGTTGC 1       0.000000        0
 AAAAAAAAAAAAAAAAAAAAAAAGGCGAACA 1       0.000000        0
 AAAAAAAAAAAAAAAAAAAAAAAAACTTCTA 1       0.000000        0
 AAAAAAAAAAAAAAAAAAAAAAACCAAACGA 1       0.000000        0
 AAAAAAAAAAAAAAAAAAAAAAAAGCTCTTA 1       0.000000        0
 AAAAAAAAAAAAAAAAAAAAAAAACGAGGCT 1       0.000000        0
 AAAAAAAAAAAAAAAAAAAAAAAAGGGATGG 1       0.000000        0
 AAAAAAAAAAAAAAAAAAAAAAAAGCATACC 1       0.000000        0
 
 ==> gwas_eigenstratX.total <==
 884693180
 873243637
 817020044
 820305452
 812869594
 811915218
 774600059
 877385766
 874686116
 878343007

The format of the gwas_eigenstratX.geno file does seem off. There is supposed to be one column per each sample. Could you print caseCount and controlCount after line 14 in the following script (and comment out the rest) to check if HAWK is getting as input right numbers of cases and controls?
https://github.com/atifrahman/HAWK/blob/master/supplements/runHawk

Ah, it seems HAWK was not getting the right numbers as input... I had tweaked the script to set them both to 1 from a previous attempt at running HAWK on only 1 sample each, and didn't set it back.