adding linsolve option for trust region method solver

Question

adding linsolve option for trust region method solver

kpobrien opened this issue 2 years ago · comments

Hi, thanks for writing and maintaining this package! I'd like to get input on adding a linsolve option to the trust region method solver, identical to the linsolve option already available for the newton method solver.

A compelling use case is to perform a KLU factorization or other non-default factorizations before solving the linear system. For some problems, this leads to improvements in speed and memory usage. The proposed changes don't change the behavior or speed when used with default options. For example:

default linear solver

@time out=nlsolve(od,method = :trust_region,x)
14.935686 seconds (25.79 k allocations: 20.709 GiB, 1.04% gc time)
12.432642 seconds (25.83 k allocations: 20.709 GiB, 0.39% gc time)
15.237596 seconds (25.93 k allocations: 20.709 GiB, 0.35% gc time)

default behavior with changes below

@time out=nlsolve(od,method = :trust_region,x)
13.542611 seconds (25.98 k allocations: 20.709 GiB, 0.38% gc time)
15.107630 seconds (25.96 k allocations: 20.709 GiB, 0.36% gc time)
13.196935 seconds (25.96 k allocations: 20.709 GiB, 0.40% gc time)

perform KLU before solving the linear system with changes below

@time out=nlsolve(od,method = :trust_region,x,linsolve=(x, A, b) -> copyto!(x,klu(A)\b))
8.797286 seconds (28.68 k allocations: 13.647 GiB, 0.92% gc time)
8.853783 seconds (28.64 k allocations: 13.647 GiB, 0.55% gc time)
8.579454 seconds (28.58 k allocations: 13.647 GiB, 0.67% gc time)

Here are the diffs for the two files I changed, nlsolve.jl and trust_region.jl. I can make a PR if that's easier.

diff solvers/trust_region.jl ~/.julia/packages/NLsolve/gJL1I/src/solvers/trust_region.jl
48c48
<                  r, d, J, delta::Real)
---
>                  r, d, J, delta::Real, linsolve)
51c51
<         copyto!(p_i, J \ vec(r)) # Gauss-Newton step
---
>         linsolve(p_i, J, vec(r)) # Gauss-Newton step
116a117
>                        linsolve,
170c171
<         dogleg!(cache.p, cache.p_c, cache.pi, cache.r, cache.d, jacobian(df), delta)
---
>         dogleg!(cache.p, cache.p_c, cache.pi, cache.r, cache.d, jacobian(df), delta, linsolve)
234,235c235,237
<                       cache = NewtonTrustRegionCache(df)) where T
<     trust_region_(df, initial_x, convert(real(T), xtol), convert(real(T), ftol), iterations, store_trace, show_trace, extended_trace, convert(real(T), factor), autoscale, cache)
---
>                       cache = NewtonTrustRegionCache(df);
>                       linsolve=(x, A, b) -> copyto!(x, A\b)) where T
>     trust_region_(df, initial_x, convert(real(T), xtol), convert(real(T), ftol), iterations, store_trace, show_trace, extended_trace, convert(real(T), factor), autoscale, linsolve, cache)

diff nlsolve/nlsolve.jl ~/.julia/packages/NLsolve/gJL1I/src/nlsolve/nlsolve.jl
28c28
<                      autoscale)
---
>                      autoscale; linsolve=linsolve)

Christopher Rackauckas · Answer 1 · Tue Jan 18 2022 11:27:49 GMT+0800 (China Standard Time)

See the proposal in JuliaNLSolvers/NLSolvers.jl#24. Giving it a bit more infrastructure like that would make things like Jacobian-Free Newton-Krylov more automatic, plus make building a bank of already wrapped linear solvers be something that comes along for the ride.

Kevin P. O'Brien · Answer 2 · Tue Jan 18 2022 16:04:46 GMT+0800 (China Standard Time)

Thanks for your comment, I agree the LinearSolve.jl interface is a better long term solution than passing the solver function directly. It would likely be a breaking change, because passing a linsolve function is already used in solvers/newton.jl:

function newton(df::OnceDifferentiable,
                initial_x::AbstractArray{T},
                xtol::Real,
                ftol::Real,
                iterations::Integer,
                store_trace::Bool,
                show_trace::Bool,
                extended_trace::Bool,
                linesearch,
                cache = NewtonCache(df);
                linsolve=(x, A, b) -> copyto!(x, A\b)) where T
    newton_(df, initial_x, convert(real(T), xtol), convert(real(T), ftol), iterations, store_trace, show_trace, extended_trace, linesearch, linsolve, cache)

In my opinion, we should add the linsolve option to trust_region.jl to make it consistent with newton.jl while folks consider whether to add a LinearSolve.jl style interface (and someone finds the time to do it).

Since you mentioned caching factorizations in your comment at the link, caching helps nlsolve significantly and can be implemented with the above proposal.

without caching the factorization

@time out=nlsolve(od,method = :trust_region,x,linsolve=(x, A, b) -> copyto!(x,klu(A)\b))
8.595526 seconds (28.64 k allocations: 13.647 GiB, 3.96% gc time)
8.620742 seconds (28.58 k allocations: 13.647 GiB, 0.45% gc time)
8.982089 seconds (28.64 k allocations: 13.647 GiB, 3.81% gc time)

with caching the factorization (sparsity structure does not change)

F = klu(J)
@time out=nlsolve(od,method = :trust_region,x,linsolve=(x, A, b) -> copyto!(x,(klu!(F,A);F\b)))
6.310709 seconds (25.54 k allocations: 9.697 GiB, 2.86% gc time)
4.685632 seconds (25.52 k allocations: 9.697 GiB, 1.01% gc time)
6.036442 seconds (25.52 k allocations: 9.697 GiB, 0.55% gc time)

Patrick Kofod Mogensen · Answer 3 · Thu Feb 10 2022 18:38:39 GMT+0800 (China Standard Time)

could you provide the example?

Kevin P. O'Brien · Answer 4 · Fri Feb 11 2022 10:31:46 GMT+0800 (China Standard Time)

The example from above is too deeply embedded in other code to easily extricate. Here is a very very contrived example (solving a linear equation) which also is faster with that change.

using SparseArrays
using NLsolve
using LinearAlgebra
using KLU
using BenchmarkTools


m = 50
C = Diagonal(rand(Complex{Float64},m)) + sprand(Complex{Float64},m,m,0.45)
d = rand(Complex{Float64},m)
x = zeros(Complex{Float64},m)  
F = zeros(Complex{Float64},m)  
J = spzeros(Complex{Float64},m,m)

function calcf!(F,x,C,d)
    F .= C*x-d
end

function calcj!(J,x,C,d)
    J .= C
    return nothing
end

function f!(F,x)
    calcf!(F,x,C,d)
    return nothing
end

function j!(J,x)
    calcj!(J,x,C,d)
    return nothing
end

j!(J,x)
FK = klu(J)
df = OnceDifferentiable(f!, j!, x, F, J);

@btime nlsolve(df,method = :trust_region,x,linsolve=(x, A, b) -> copyto!(x,(klu!(FK,A);FK\b)));
@btime nlsolve(df,method = :trust_region,x);

280.979 μs (356 allocations: 71.06 KiB)
928.750 μs (352 allocations: 1.85 MiB)

If you change the method to :newton you can see the improvement without changing any NLsolve.jl code. You should see a benefit for sparse matrices anytime a factorization is faster when reusing the symbolic factorization. For example, when the sum of the last two values is smaller than either of the previous two.

@btime C \ d;
@btime FK = klu(C);
@btime klu!(FK,C);
@btime FK\d;

188.056 μs (72 allocations: 448.84 KiB)
93.439 μs (76 allocations: 114.13 KiB)
45.093 μs (52 allocations: 4.12 KiB)
3.784 μs (20 allocations: 2.27 KiB)

Let me know if you don't find this example convincing and I will make a nonlinear example.

Kevin P. O'Brien · Answer 5 · Fri Feb 11 2022 10:57:56 GMT+0800 (China Standard Time)

It's also a little bit faster for your test case in test/sparse.jl

using NLsolve

function f_sparse!(F, x)
    F[1] = (x[1]+3)*(x[2]^3-7)+18
    F[2] = sin(x[2]*exp(x[1])-1)
end

function g_sparse!(J, x)
    J[1, 1] = x[2]^3-7
    J[1, 2] = 3*x[2]^2*(x[1]+3)
    u = exp(x[1])*cos(x[2]*exp(x[1])-1)
    J[2, 1] = x[2]*u
    J[2, 2] = u
end

J = spzeros(2, 2)
x = rand(2)
F = rand(2)

g_sparse!(J,x)
FK = klu(J);
df = OnceDifferentiable(f_sparse!, g_sparse!, x, F, J)
@btime nlsolve(df,x,method = :trust_region,autoscale = true,linsolve=(x, A, b) -> copyto!(x,(klu!(FK,A);FK\b)));
@btime nlsolve(df,x,method = :trust_region,autoscale = true);

11.012 μs (408 allocations: 31.30 KiB)
17.567 μs (388 allocations: 82.74 KiB)