Implementation of "simple iteration" for the `fixedpoint` with out of place functions.

Question

Implementation of "simple iteration" for the `fixedpoint` with out of place functions.

jlperla opened this issue 6 years ago · comments

I added in a shell for how out of place execution of the simple fixed point iteration.

In the committed files, you will see /src/nlsolve/fixedpoint.jl, which has the exported function for users to call. I copied/pasted the nlsolve.jl code, so this may be imperfect.
That fixedpoint(...) function has placeholders to call various algorithms, which I commented out. I put in a /src/solves/simple_iteration.jl file which we can hopefully use to get the basic fixed point iteration up and running.
I moved the code from our test/fixed_points.jl to this new /src/solves/simple_iteration.jl and made a few tweaks. You will see that I (1) removed the fixed point of a univariate function (which won't work the way nlsolve is defined); (2) left in code for the inplace fixedpointOLD! function, which is not implemented; and (3) changed the code for calling the function to use the df.f() notation (as this is the general way to wrap things in this library).

What I want to do is finish the implementation of simple_iteration for these fixed points. Some of the obvious things to do:

Read through an implementation of one of the solvers (e.g. newton) and see how it deals with tracing of results, etc.
Right now it is just passing back a tuple of the result and number of iterations. You should figure out what the nlsolve solvers return and the simple_iteration should return identical structures as the solver.
Hook up any of the options passed into the fixedpoint function arguments to use in the simple_iteration algorithm.
Finish up the (not inplace) tests
I would see if you can mimic the Cache and trace options for this if possible (looking at something like the newton solver) but if it is difficult then it can be added later.

Antoine Levitt · Answer 1 · Wed Jul 04 2018 02:06:04 GMT+0800 (China Standard Time)

Sorry if I missed something, but what was wrong with just using Anderson acceleration with no history?

Jesse Perla · Answer 2 · Wed Jul 04 2018 02:12:18 GMT+0800 (China Standard Time)

The key is to be able to have a fixedpoint as a function and then use different algorithms in the background, starting with just iterating on the map.

For the Newton and Anderson the interface would basically just do a little bookkeeping and then call nsolve with the anderson or newton tags.. In the case of the netwon, it might need to fudge any user defined jacobian by subtracting a identity matrix or something... But we can just get simple iteration working first.

Antoine Levitt · Answer 3 · Wed Jul 04 2018 02:20:23 GMT+0800 (China Standard Time)

So you mean have something like fixedpoint(f) = nlsolve(x -> f(x) - x) (but with all the bells of whistles of inplace/out of place, jacobians, etc.)? Sure, that makes sense, but there's no need to reimplement any algorithm, right?

Jesse Perla · Answer 4 · Wed Jul 04 2018 02:22:12 GMT+0800 (China Standard Time)

@arnavsood I think the basic approach is solid. A few other things:

For the simple-iteration implementation, instead of overloading you may want to just have a booleanpassed in and have an if logic in the function. This is how it will end up being hooked into the function (see the inplace=true in https://github.com/JuliaNLSolvers/NLsolve.jl/blob/master/src/nlsolve/nlsolve.jl#L49)

Part of the testing should be to try it with different types. A few comments:

https://github.com/JuliaNLSolvers/NLsolve.jl/blob/f7fd82064e0581194204ce879cf5336fbeeee73c/test/fixed_points.jl has some tests started.
Test with StaticArrays.jl for sure. Note that it is not possible to have these support inplace, as static arrays are immutable.
You can test things with univariate values, but I think you will need to pass them in as SVector or an array of length = 1. nlsolve is designed to work with vectors, not univariate functions.
In the tests, you will see

function f!(x, fx)
        fx[:] = A * x + b
end

for the inplace, but I think it is supposed to be fx .= A * x + b instead? You can test.

Jesse Perla · Answer 5 · Wed Jul 04 2018 02:25:21 GMT+0800 (China Standard Time)

So you mean have something like fixedpoint(f) = nlsolve(x -> f(x) - x) (but with all the bells of whistles of inplace/out of place, jacobians, etc.)? Sure, that makes sense, but there's no need to reimplement any algorithm, right?

@antoine-levitt The only algorithm that is missing is the simple iteration one (which is the default fixed point algorithm in many cases). Everything else is just bells and whistles, exactly as you are saying. The simple-iteration is an essential algorithm because of convergence properties - i.e. if you can prove the mapping is a contraction, then it is basically executing Banach's fixed point theorem. For better or worse, it is usually the one economists start with - though I hope they use fancier algorithms if we can show them how to swap things out!

Antoine Levitt · Answer 6 · Wed Jul 04 2018 02:31:22 GMT+0800 (China Standard Time)

Again, just do anderson with m=0 and you're set. This thread (and others) just made me afraid that a lot of code is going to be duplicated with no real motivation, complicating further developments and increasing technical debt, but as long as it's just the API and no algorithm actually gets implemented that's perfectly fine!

The simple-iteration is an essential algorithm because of convergence properties - i.e. if you can prove the mapping is a contraction, then it is basically executing Banach's fixed point theorem

Anderson acceleration actually preserves this property under relatively mild assumptions, see https://epubs.siam.org/doi/abs/10.1137/130919398

Jesse Perla · Answer 7 · Wed Jul 04 2018 02:39:06 GMT+0800 (China Standard Time)

Again, just do anderson with m=0 and you're set. This thread (and others) just made me afraid that a lot of code is going to be duplicated with no real motivation, complicating further developments and increasing technical debt, but as long as it's just the API and no algorithm actually gets implemented that's perfectly fine!

We don't want to duplicate code at all, just get the API working with overhead-free performance. The only code that needs to be duplicated is the nsolve to fixedpoint interface.

For using Anderson with the m=0 case, the thought was to get the minimal simple-iteration code up and running, but if you think the performance will be nearly identical, then they could try to do that route instead? We can use the naive, simple fixed-point iteration in the tests for benchmark comparison, and only implement a separate algorithm if there is significant overhead?

Jesse Perla · Answer 8 · Wed Jul 04 2018 02:50:26 GMT+0800 (China Standard Time)

Anderson acceleration actually preserves this property under relatively mild assumptions, see https://epubs.siam.org/doi/abs/10.1137/130919398

Thanks! One of my hopes with getting this convenience wrapper is that we can convince economists on the convergence and speed of better fixed point algorithms they can test by just swapping out an algorithm flag.

Antoine Levitt · Answer 9 · Wed Jul 04 2018 02:53:29 GMT+0800 (China Standard Time)

As mentioned in #152, I don't think there will be any significant performance difference. The only thing you can exploit is shortcutting x + g(x) - x into g(x), which is unlikely to matter unless g is very cheap (in which case you're probably better off with StaticArrays, and in that case the compiler might even be smart enough to compile the redundant operations away). But benchmarks would certainly be interesting!

Thanks! One of my hopes with getting this convenience wrapper is that we can convince economists on the convergence and speed of better fixed point algorithms they can test by just swapping out an algorithm flag.

Even more reason to use anderson as the fixed-point algorithm: a better algorithm is just a m=10 away!

Arnav Sood · Answer 10 · Wed Jul 04 2018 03:00:37 GMT+0800 (China Standard Time)

OK, talked with @jlperla. So we will write these tests and use :simple_iteration (or however we pass it in) as a call to Anderson with no memory.

Arnav Sood · Answer 11 · Thu Jul 05 2018 07:15:42 GMT+0800 (China Standard Time)

@AliKarimirad, if you pull the git repo now (branch fixed-point-clean, as of commit b0a1fac or later), it should be a working version with one method in there for the fixed point of a function (in place or out of place), without the Jacobian, located in src/nlsolve/fixedpoint.jl.

Here's what we need to do:

Test whether the function works for contents of different types (e.g., Int64, Float64, etc.), and for different kinds of containers (e.g., a normal Vector vs a StaticArray). Since Julia types are objects, we can use them in loops, as follows:

@testset "PreMetric, SemiMetric, Metric on $T" for T in (Float64, Int64, BigFloat, BigInt) begin ... end

Test that the function is "failing gracefully," or that it's returning the right kinds of errors for (say) dimension mismatches, argument errors, method errors, etc. The key is not to repeat the tests that apply to the nlsolve(...) part of the function. So, looking for the f(x) - x behavior, and the out .-= x behavior, and any kind of issues with the closures, are probably the big ones.
Test whether the function is doing type inferences correctly. We can do this by playing around with the functions as written, using the @code_warntype macro. The key is to make sure that we're avoiding things of type Any or Box.
Ideally (and @jlperla can give input) benchmark the way we wrote the function (which uses Anderson acceleration), against the "naive" simple iteration code found in test/fixedpoint.jl.

Once these are done, we can try looking at methods with Jacobians, like Newton.

Arnav Sood · Answer 12 · Thu Jul 05 2018 07:34:01 GMT+0800 (China Standard Time)

Pushed some placeholders to the file.

Jesse Perla · Answer 13 · Fri Jul 06 2018 02:00:11 GMT+0800 (China Standard Time)

@antoine-levitt The overhead in a simple linear map appears to be about a factor of 3. Here is the code (using @arnavsood current implementation in the fixed-point-clean branch)

using NamedTuples, NLsolve, BenchmarkTools
#I think the tolerance matches the default with NLsolve
function iterate(f, x0; residualnorm = (x -> norm(x,Inf)), tol = 1e-8, maxiter=1000) 
    residual = Inf
    iter = 1
    xold = x0
    xnew = copy(x0)
    while residual > tol && iter < maxiter
        xnew = f(xold)
        residual = residualnorm(xold - xnew)
        xold = copy(xnew)
        iter += 1
    end
    return (xold,iter)
end

#Simple linear map
N = 50
maxiter = 10000
A = Diagonal(rand(N)) #I think this is always a contraction?
b = rand(N,1)
f(x) = A * x + b
x_0 = rand(N,1)
f(x_0)

#Can see it is the same number of iterations/etc.
@show iterate(f, x_0, maxiter=maxiter)
@show fixedpoint(f, x_0, inplace=false, iterations=maxiter)

Then the benchmarking is

@btime iterate($f, $x_0, maxiter=$maxiter)
@btime fixedpoint($f, $x_0, inplace=false, iterations=$maxiter)

The first takes about 164 microseconds while the second 411. When I change the N, it seems to always be in the 2-3X overhead (i.e., not just a fixed cost of overhead).

Any thoughts on how to speed things up looking at https://github.com/JuliaNLSolvers/NLsolve.jl/blob/fixed-point-clean/src/nlsolve/fixedpoint.jl

Christopher Rackauckas · Answer 14 · Fri Jul 06 2018 02:02:40 GMT+0800 (China Standard Time)

Run the profiler to identify the bottleneck

Jesse Perla · Answer 15 · Fri Jul 06 2018 02:13:40 GMT+0800 (China Standard Time)

There is so little code here prior to calling nlsolve that I suspect it would just be optimizing NLsolve (i.e. WAY past our paygrade). The only thing I can think of is the use of a closure in https://github.com/JuliaNLSolvers/NLsolve.jl/blob/fixed-point-clean/src/nlsolve/fixedpoint.jl#L29 but we tried to ensure that it was type-stable (though it would be useful for a sanity check if you see anything obviously wrong).

That said, if you can point @arnavsood hints on how to do profiling in julia it might be useful for the future, it might be useful.

Christopher Rackauckas · Answer 16 · Fri Jul 06 2018 02:15:53 GMT+0800 (China Standard Time)

The closure should inline. I meant profile NLsolve.jl and optimize it.

Profiling in Juno is quite simple: https://stackoverflow.com/questions/49719076/macos-python-with-numpy-faster-than-julia-in-training-neural-network/49724611#49724611

Or use ProfileView.jl: https://github.com/timholy/ProfileView.jl .

If you identify bad lines then it would every one else make it faster.

Antoine Levitt · Answer 17 · Fri Jul 06 2018 19:16:44 GMT+0800 (China Standard Time)

There is so little code here prior to calling nlsolve that I suspect it would just be optimizing NLsolve (i.e. WAY past our paygrade)

Sorry but... why? This is a NLsolve issue, I thought the point was to improve NLsolve. The code isn't very complicated. It's very likely that there are some inefficiencies in the code that can be fixed (which was not developed for the case where the function evaluation is very cheap, and so there might be a few more copy than necessary for instance). Also, anderson quite heavily uses views, which are faster in 0.7 than in 0.6.

As for profiling, it's indeed very simple: @profile somecode() and ProfileView.view() (though that has some bugs you have to work around, see the issues in ProfileView)

Antoine Levitt · Answer 18 · Fri Jul 06 2018 21:57:59 GMT+0800 (China Standard Time)

So I played around with it a bit, and the biggest offenders are convergence assessment (computing the norm of the residual, checking it's not NaN, etc.). Removing the "+b" (ie, saving an addition per iteration) gained about 10%. I would venture a guess that no realistic problem is simple enough that computing the infinity norm of the residual is the bottleneck of the algorithm, and so I wouldn't worry too much about the potential overhead of NLsolve.

using NLsolve
using BenchmarkTools
srand(0)
N = 1000
A = Diagonal(rand(N) .- 2) # make it negative
b = rand(N)
f(x) = A * x + b
x_0 = rand(N)
@btime nlsolve(f, x_0; inplace=false, iterations=1000, method=:anderson)

Arnav Sood · Answer 19 · Fri Jul 06 2018 23:55:31 GMT+0800 (China Standard Time)

@antoine-levitt I think Jesse's point wasn't that improving nlsolve() is a bad idea, but rather that it's just difficult/not the kind of thing him and I want to mess with (at least for me). And since our fixedpoint() is just a "thin" wrapper around nlsolve()...

Those profiling results are useful; thanks for obtaining them. I'm still pretty new to writing and testing "production" Julia code, FWIW. I wonder if there's a valid economic use case for bifurcating the function into "cheap" vs "expensive" function calls (i.e., if the optimizations that minimize function calls eat up more time than they're worth, for cheap functions). But I'll leave that to you and @jlperla.

Jesse Perla · Answer 20 · Sat Jul 07 2018 00:06:43 GMT+0800 (China Standard Time)

Sorry but... why? This is a NLsolve issue, I thought the point was to improve NLsolve. The code isn't very complicated.

Not a very helpful way to encourage a community to help support a package... I have been trying to sponsor people to contribute to the package over the last 6'ish months, and I can promise you that the package is much more complicated than you realize. This has been discussed with @pkofod , but I think it is worth keeping it in mind. Not to say that there are better ways to write the pacakge, just that not everyone is capable of understanding (let alone optimizing) it.

Moving past that comment. Of course the goal is to improve NLsolve, but not everyone is capable of optimizing complicated, generic Julia code (which has very rudimentary tooling at this point for non-experts).

I played around with it a bit, and the biggest offenders are convergence assessment (computing the norm of the residual, checking it's not NaN, etc.). Removing the "+b" (ie, saving an addition per iteration) gained about 10%.

Thanks for taking a look at it! Of course, 10% is not a factor of 2-3 times though... did you try @arnavsood fixed-point-clean branch?

I wonder if there's a valid economic use case for bifurcating the function into "cheap" vs "expensive" function calls (i.e., if the optimizations that minimize function calls eat up more time than they're worth, for cheap functions).

@arnavsood : If you mean that there could be special logic in the Anderson implementation to deal with the m=0, beta =1 case better, then that can be considered... but I don't think that bifurcating is generally a good idea.

Antoine Levitt · Answer 21 · Sat Jul 07 2018 00:38:34 GMT+0800 (China Standard Time)

Apologies if that seemed aggressive: it's just that I genuinely don't understand what problem you're trying to solve here. Since this is the issue tracker I assumed that we were supposed to understand and discuss this, but this looks more like an internal TODO-list, so I'll leave you to it. My only points are that 1) the overhead of converting a fixed-point problem to a zero-finding one is negligible and 2) adding code complicates further maintenance (as you mention, the more layers there are in a code, the harder it is to work on it) and should only be done if there's a real need.

Jesse Perla · Answer 22 · Sat Jul 07 2018 00:48:58 GMT+0800 (China Standard Time)

Any misperceived aggression is already forgotten. And thank you so much for looking at this, it is very helpful. The toughest thing to figure out in the implementations are the Cache. I wonder if a few special cases for m=0 on the cache might help out.

To give you the simple story: There are a huge number of algorithms in economics that have a fixed-point solution at the heart of it. Often with very simple functions and low dimensions for an individual iteration, but where the nested fixed point is run an enormous number of times. The way people tend to do this right now is to just write the fixed point iteration directly in the code (e.g. https://lectures.quantecon.org/jl/mccall_model.html )

Now, my hope is to teach students that using orthogonal algorithms is the best approach for: (1) code clarity; (2) (ultimately) performance given the same algorithm; (3) the ability to swap out more advanced algorithms (e.g. anderson and derivative based approaches rather than naive fixed point iteration). I would love to just be able to replace those sorts of things in lectures with fixedpoint as a higher-order function. However.... if the performance is off by a factor of 2-3 AND the compilation feels like it takes an extra 10+ seconds for students, it is a hard sell.

Let me see if one of the RAs can do a modified version of one of these lectures to use the library and see if the performance is only off by 10-20%, which I think is perfectly acceptable.

Antoine Levitt · Answer 23 · Sat Jul 07 2018 00:59:27 GMT+0800 (China Standard Time)

Often with very simple functions and low dimensions for an individual iteration, but where the nested fixed point is run an enormous number of times

This is typically the case where anderson acceleration is very efficient, because it inherits the nice global properties of fixed-point iterations (which might be lost in Newton-based approach) while maintaining good convergence rates. However these Bellman equations look very non-differentiable, so I don't know if it will be very effective.

The overhead of converting from fixed-point to zero-finding should be negligible in most cases, as should be the fact of using the NLsolve library (which does have some overhead for residual computation, bookkeeping, NaN checking, etc.) rather than writing out the code by hand. I think it's perfectly fine to have slightly worse performance for trivial test problems (we are talking ms here), in order to get robust/efficient/flexible code in the complicated cases. The compilation time problem is indeed annoying, but an unavoidable (in the current state of julia at least) side-effect of using a complete library.

Jesse Perla · Answer 24 · Sat Jul 07 2018 06:50:23 GMT+0800 (China Standard Time)

@antoine-levitt

OK, so we implemented https://lectures.quantecon.org/jl/mccall_model_with_separation.html with the fixedpoint branch and kept the code in https://github.com/JuliaNLSolvers/NLsolve.jl/tree/fixed-point-clean as a test.

The summary is that:

The "roll your own" iteration which the original code was written in is 2X faster than the one calling nlsolve with m=0. This is consistent with the 100% overhead I was seeing before in the simpler example.
However, the encouraging part is that anderson acceleration with m=2 is 20X faster than that (i.e. 10X faster than the roll-your-own fixed point). This helps the case for why using the library is a good idea in principle.

cc: @Nosferican @sglyon For quantecon lecture notes, you can take a look at the branch and the test.

Arnav Sood · Answer 25 · Sat Jul 07 2018 09:01:24 GMT+0800 (China Standard Time)

Squashed a bug, updated REQUIRE, and added some more tests. Should be good to look at as of latest commit (66af0c4).

Antoine Levitt · Answer 26 · Sat Jul 07 2018 14:50:42 GMT+0800 (China Standard Time)

However, the encouraging part is that anderson acceleration with m=2 is 20X faster than that (i.e. 10X faster than the roll-your-own fixed point). This helps the case for why using the library is a good idea in principle.

Cool, especially since Anderson is very much not optimized for the case of a cheap function evaluation. Did you try increasing m? Usual values are around 10.

Patrick Kofod Mogensen · Answer 27 · Sun Jul 15 2018 20:11:10 GMT+0800 (China Standard Time)

Just wanted to say that I'm here in the background an will take part in the discussion as soon as my vacation ends!

Arnav Sood · Answer 28 · Wed Jul 25 2018 01:47:52 GMT+0800 (China Standard Time)

I believe this issue is good to close, since what's on the tin (simple iteration for out-of-place functions) is in the PR. There's a lot of off-label thought here, but my feeling is that we can open separate, new issues for those.

So I'll close it, but anyone can re-open if they wish.