OpenMendel / TrajGWAS.jl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some question about test and pval file

RoyeSie opened this issue · comments

Hello, this software works very well but I have some questions.

  1. In the pval file, does betadir/taudir means the direction of allel effect?
  2. Is there a way to calculate or display the allel effect in a score test?wald test is very time-consuming
  3. Wald test seems not to work with allel types such as "TC" "AAC"
    Also I found a small problem in line 278 of the "gwas.jl", it should be @warn instead of warn

Thank you for your interest in our package.

  1. Yes, betadir and taudir mean the direction of allele effect with respect to the first allele (Plink, BGEN) / alternate allele (VCF).
  2. One way would be running the score test first, and running the Wald test only with the significant alleles. We have filtering tools in SnpArrays.jl and VCFTools.jl.
  3. Could you provide a little more detail or a minimal working example for this one, please?

Thank you for spotting the issue. Line 278 will be fixed soon.

In my bim file some snp will be this form
16 rs10680802 0 53726175 CAG C
biallelic is not simply A/T/C/G
Wald testing will be stopped in these SNPs with"ERROR: ArgumentError: matrix contains Infs or NaNs"

It sounds like a numerical issue rather than the SNP being AG insertion/deletion. I don't think that kind of issue has something to do with allele type. Could you please paste the full error message including the call stack and which line this is happening in?

I might need to look into the numbers later to resolve this kind of issue.

like this?
run = 2, ‖Δβ‖ = 1.068269, ‖Δτ‖ = 1.890190, ‖ΔL‖ = 0.023151, status = Optimal, time(s) = 6.472817
┌ Warning: Ipopt finished with status Invalid_Number_Detected
└ @ Ipopt ~/.julia/packages/Ipopt/QF8Lc/src/MPB_wrapper.jl:195
┌ Warning: Optimization unsuccesful; got Error; run = 1
└ @ WiSER ~/.julia/packages/WiSER/etxnv/src/fit.jl:63
run = 1, ‖Δβ‖ = NaN, ‖Δτ‖ = 0.000000, ‖ΔL‖ = 0.000000, status = Error, time(s) = 0.530174
┌ Warning: Ipopt finished with status Invalid_Number_Detected
└ @ Ipopt ~/.julia/packages/Ipopt/QF8Lc/src/MPB_wrapper.jl:195
┌ Warning: Optimization unsuccesful; got Error; run = 2
└ @ WiSER ~/.julia/packages/WiSER/etxnv/src/fit.jl:63
run = 2, ‖Δβ‖ = NaN, ‖Δτ‖ = 0.000000, ‖ΔL‖ = 0.000000, status = Error, time(s) = 0.583764
ERROR: ArgumentError: matrix contains Infs or NaNs

Actually, the full output would be more helpful. Everything you get, including the lines below the "ERROR" line.

The full output will be more helpful but initially it looks like it could be an issue with numeric instability. Are the variants you mention very rare? That could potentially cause issues.

Can you also try another solver some examples are shown here trajgwas() takes solver as an input.

You may also want to try standardizing predictors to see if that helps.

This is my first time using julia, so I'm not very familiar with it.
Below the "ERROR" line :
Stacktrace:
[1] chkuplofinite(A::Matrix{Float64}, uplo::Char)
@ LinearAlgebra.LAPACK /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/lapack.jl:104
[2] syevr!(jobz::Char, range::Char, uplo::Char, A::Matrix{Float64}, vl::Float64, vu::Float64, il::Int64, iu::Int64, abstol::Float64)
@ LinearAlgebra.LAPACK /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/lapack.jl:5084
[3] #eigen!#99
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/symmetric.jl:675 [inlined]
[4] #eigen#100
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/symmetric.jl:680 [inlined]
[5] eigen
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/symmetric.jl:678 [inlined]
[6] sandwich!(m::WSVarLmmModel{Float64})
@ WiSER ~/.julia/packages/WiSER/etxnv/src/sandwich.jl:35
[7] fit!(m::WSVarLmmModel{Float64}, solver::IpoptSolver; init::WSVarLmmModel{Float64}, runs::Int64, parallel::Bool, verbose::Bool)
@ WiSER ~/.julia/packages/WiSER/etxnv/src/fit.jl:78
[8] (::TrajGWAS.var"#21#31"{IOStream, FormulaTerm{Term, Term}, Symbol, Val{1}, Bool, Bool, UnitRange{Int64}, IpoptSolver, Bool, Int64, Float64, Nothing, Float64, WSVarLmmModel{Float64}, Bool, Vector{Float64}, Matrix{Int64}, SnpArray})(bimio::IOStream)
@ TrajGWAS ~/.julia/packages/TrajGWAS/xR5qF/src/gwas.jl:537
[9] makestream(f::TrajGWAS.var"#21#31"{IOStream, FormulaTerm{Term, Term}, Symbol, Val{1}, Bool, Bool, UnitRange{Int64}, IpoptSolver, Bool, Int64, Float64, Nothing, Float64, WSVarLmmModel{Float64}, Bool, Vector{Float64}, Matrix{Int64}, SnpArray}, args::String)
@ SnpArrays ~/.julia/packages/SnpArrays/WBzpL/src/codec.jl:31
[10] (::TrajGWAS.var"#19#29"{FormulaTerm{Term, Term}, Symbol, Val{1}, Bool, Bool, UnitRange{Int64}, IpoptSolver, Bool, Int64, Float64, Nothing, Float64, WSVarLmmModel{Float64}, String, Bool, Vector{Float64}, Matrix{Int64}, SnpArray})(io::IOStream)
@ TrajGWAS ~/.julia/packages/TrajGWAS/xR5qF/src/gwas.jl:460
[11] makestream(::TrajGWAS.var"#19#29"{FormulaTerm{Term, Term}, Symbol, Val{1}, Bool, Bool, UnitRange{Int64}, IpoptSolver, Bool, Int64, Float64, Nothing, Float64, WSVarLmmModel{Float64}, String, Bool, Vector{Float64}, Matrix{Int64}, SnpArray}, ::String, ::Vararg{String, N} where N)
@ SnpArrays ~/.julia/packages/SnpArrays/WBzpL/src/codec.jl:31
[12] trajgwas(fittednullmodel::WSVarLmmModel{Float64}, bedfile::String, bimfile::String, bedn::Int64; analysistype::String, testformula::FormulaTerm{Term, Term}, test::Symbol, pvalfile::String, snpmodel::Val{1}, snpinds::UnitRange{Int64}, usespa::Bool, reportchisq::Bool, bedrowinds::UnitRange{Int64}, solver::IpoptSolver, parallel::Bool, runs::Int64, verbose::Bool, snpset::Nothing, e::Nothing, r::Float64, adjustor::Nothing, adj_cutoff::Float64)
@ TrajGWAS ~/.julia/packages/TrajGWAS/xR5qF/src/gwas.jl:412
[13] trajgwas(fittednullmodel::WSVarLmmModel{Float64}, geneticfile::String; analysistype::String, geneticformat::String, vcftype::Nothing, samplepath::Nothing, testformula::FormulaTerm{Term, Term}, test::Symbol, pvalfile::String, snpmodel::Val{1}, snpinds::UnitRange{Int64}, usespa::Bool, reportchisq::Bool, geneticrowinds::Nothing, solver::IpoptSolver, parallel::Bool, runs::Int64, verbose::Bool, snpset::Nothing, e::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ TrajGWAS ~/.julia/packages/TrajGWAS/xR5qF/src/gwas.jl:282
[14] trajgwas(nullmeanformula::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term}}, reformula::FormulaTerm{Term, ConstantTerm{Int64}}, nullwsvarformula::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term}}, idvar::Symbol, nulldf::DataFrame, geneticfile::String; nullfile::String, solver::IpoptSolver, parallel::Bool, runs::Int64, verbose::Bool, kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:pvalfile, :snpinds, :test), Tuple{String, UnitRange{Int64}, Symbol}}})
@ TrajGWAS ~/.julia/packages/TrajGWAS/xR5qF/src/gwas.jl:163
[15] #trajgwas#5
@ ~/.julia/packages/TrajGWAS/xR5qF/src/gwas.jl:134 [inlined]
[16] top-level scope
@ ./timing.jl:210 [inlined]
[17] top-level scop

I tried several similar SNPs,
16 chr16:53706566_A_AT 0 53706566 AT A
16 rs1015376116 0 53718702 A AT
16 rs1167667885 0 53719496 T TGGTTTACTAG
16 rs10680802 0 53726175 CAG C
16 chr16:53739745_G_GT 0 53739745 GT G

chr16:53706566_A_AT and rs10680802 will error but other snps not
I don't understand the wrong rule yet

I believe something is going on the numerical side—nothing to do with allele type. We will need to check allele frequencies, try to normalize predictors, and/or try to change solver settings. I will post some code to check allele frequencies later today.

@RoyeSie

Would you try running this code?

using SnpArrays
s = SnpData("<prefix of your plink data>")
println(maf(s.snparray)[s.snp_info[!, :snpid] .== "rs10680802"]) # and other SNPs that your code fails on

julia> println(maf(s.snparray)[s.snp_info[!, :snpid] .== "rs10680802"]) # and other SNPs that your code fails on
[0.2970530641070913]

Sorry for missing this out. This seems to be a kind of numerical issue we were not aware of. The SNP is not that rare...

Hi, I am having a related issue even with the example dataset: Wald test seems not to be working if you select >= 50 SNPs. I am using precisely the same model specifications as you do in your tutorial (the model in the "Basic Usage" section), with the only modification being that I changed "test = :score" to "test = :wald". And I get the same error as @RoyeSie (as posted on 22 July). My rough guess is that this is purely about the optimization.
Are you currently working on fixing this issue or is it on hold?

Thank you for providing a working example. We were having trouble reproducing the issue. In this situation, one SNP fails in the optimization, stopping the entire pipeline. We can first print a message that testing for a specific SNP fails and move on, not stopping the whole run. Ideally, I hope to be able to fix the issue entirely. We recently added the init keyword to try another initial condition. e.g., adding

init = x -> WiSER.init_ls!(x; gniters=0),

Changing the initialization helped resolve some other situations (discussed off-GitHub), but it did not help in this case. I will look into it to find out what is going on.

In most situations with common alleles, just changing the initialization is good enough to resolve the issue.

However, with the hapmap3 example, it is happening with a SNP with a very low MAF (0.004). Our development is mainly focused on making sure that the score test is working with this low MAF, but we cannot really guarantee that Wald test works well with a rare variant. At this point, we cannot get the Wald test run with that specific SNP using the set of initialization methods we have in WiSER.jl. Hence, the action plan is just to print out that the test failed, display MAF, and move on to the next SNP. It will be finished next week.

Updated ver 0.2.2:

Wald test is now much more robust to failure. It will try with another initialization first, and then if it fails, it will give you a warning with effect size NaN and p-value -1.