Allow chunksize choice in autodiff setup?

Question

Allow chunksize choice in autodiff setup?

ChrisRackauckas opened this issue 8 years ago · comments

Christopher Rackauckas commented 8 years ago

I was wondering if you think allowing the chunksize choice in the autodiff setup is a good idea. I have a working implementation that I am using since it's currently not allowed:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/misc_utils.jl#L33

But the issue is, if I was to make a PR, I can't think of a good way to pass it in. You'd want to make it a kwarg like autodiff, but then it wouldn't be type stable since you would need to pass it by value-type for the type-stability. I handle this with a "pre-stage" where I can use a @pure constructor, but I'm not sure that applies here.

Spencer Lyon · Answer 1 · Tue Jan 24 2017 22:06:36 GMT+0800 (China Standard Time)

I'm not up to speed on the latest best practices on the use of the autodiff packages, but I think I recall seeing that chunksize is or will soon be deprecated. Is that possible, or am I remembering wrong?

cc @jrevels

Jarrett Revels · Answer 2 · Tue Jan 24 2017 23:19:23 GMT+0800 (China Standard Time)

I think I recall seeing that chunksize is or will soon be deprecated.

Yeah, that keyword argument is deprecated - chunk size is now specified by a type parameter of the *Config types. You can ask the user to pass you a *Config type, so that you have the preallocated work buffers and predetermined chunk size. (I'm sure @ChrisRackauckas knows this already, though, since I've seen him use the new API before.)

You'd want to make it a kwarg like autodiff, but then it wouldn't be type stable since you would need to pass it by value-type for the type-stability.

Keyword argument performance is unfortunately quite a pain point for Julia right now...type-inferable/fast keyword arguments are one of my most wanted features. I'm really hoping something like JuliaLang/julia#16580 makes it into 1.0. I struggled with keyword argument performance when I first started working on ForwardDiff, and it's the reason I refactored ForwardDiff's API to get rid of keyword arguments and use an inferable configuration type instead.

Christopher Rackauckas · Answer 3 · Tue Jan 24 2017 23:23:15 GMT+0800 (China Standard Time)

Yeah, that keyword argument is deprecated - chunk size is now specified by a type parameter of the *Config types. You can ask the user to pass you a *Config type, so that you have the preallocated work buffers and predetermined chunk size. (I'm sure @ChrisRackauckas knows this already, though, since I've seen him use the new API before.)

What I am proposing is a way to pass the chosen chunksize to the type parameter of the *Config type. Specifically, from the code I linked:

Base.@pure function autodiff_setup{CS}(f!, initial_x::Vector,chunk_size::Type{Val{CS}})

    permf! = (fx, x) -> f!(x, fx)

    fx2 = copy(initial_x)
    jac_cfg = ForwardDiff.JacobianConfig{CS}(initial_x, initial_x)
    g! = (x, gx) -> ForwardDiff.jacobian!(gx, permf!, fx2, x, jac_cfg)

    fg! = (x, fx, gx) -> begin
        jac_res = DiffBase.DiffResult(fx, gx)
        ForwardDiff.jacobian!(jac_res, permf!, fx2, x, jac_cfg)
        DiffBase.value(jac_res)
    end

    return DifferentiableMultivariateFunction(f!, g!, fg!)
end

In NLsolve.jl there is currently no way to set the config unless you make the DifferentiableMultivariateFunction itself, which is why I have this setup in my own package but was wondering about a good way to simply add it to NLsolve itself.

Spencer Lyon · Answer 4 · Tue Jan 24 2017 23:27:40 GMT+0800 (China Standard Time)

Ahh I see, ok sounds like you guys are the ones to figure out how to implement this!

Christopher Rackauckas · Answer 5 · Tue Jan 24 2017 23:31:21 GMT+0800 (China Standard Time)

Well, that code would just add to this:

https://github.com/EconForge/NLsolve.jl/blob/master/src/autodiff.jl#L1

Chunksize would have to be an optional arg which would call the ForwardDiff default chunksize function when it's not specified. My question before was how to set this up all type-stable, but it sounds like from the other thread that I shouldn't worry about that and instead should just pass the number as a keyword argument here @KristofferC ?

Jarrett Revels · Answer 6 · Wed Jan 25 2017 00:33:21 GMT+0800 (China Standard Time)

Luckily, if your API uses ForwardDiff's in-place methods (like in your code), then it's not as big of a deal for the chunk size to be dynamic (since the result type is known regardless). It does mean you probably can't take advantage of certain optimizations (e.g. SIMD), though.

chunksize would have to be an optional arg which would call the ForwardDiff default chunksize function when it's not specified

Why not use autodiff_setup(f!, initial_x::Vector, cfg::JacobianConfig = JacobianConfig(initial_x, initial_x))? That allows the chunk size setting to have the behaviors you want. It also allows preallocation of the work buffers.

I'm not super familiar with NLSolve, but why can't the code you linked simply be the API for using AD with NLSolve? In other words, remove autodiff = true and replace it with an easy-to-use constructor for making a DifferentiableMultivariateFunction with AD. IMO, this is a better API because it inverts control to the caller and allows for more flexibility. I feel like most APIs that use AD underneath should adopt this kind of convention rather than catch-all options like autodiff = true.

Kristoffer Carlsson · Answer 7 · Wed Jan 25 2017 18:25:38 GMT+0800 (China Standard Time)

It does mean you probably can't take advantage of certain optimizations (e.g. SIMD), though.

I am surprised about this. Isn't it the case that as long as the function doesn't get inlined, then the generated code for the keyword argument function will be exactly the same as the one without keywords:

For example:

julia> @inline f(;x=1.0) = (z = 0.0; for i in 1:10^5; z += sin(x); end; z)
f (generic function with 2 methods)

julia> @noinline g(;x=1.0) = (z = 0.0; for i in 1:10^5; z += sin(x); end; z)
g (generic function with 1 method)

julia> @btime f(x = 0.5)
  3.228 ms (200001 allocations: 3.05 MiB)
47942.55386052721

julia> @btime g(x = 0.5)
  642.804 μs (2 allocations: 112 bytes)
47942.55386052721

Jarrett Revels · Answer 8 · Wed Jan 25 2017 22:15:21 GMT+0800 (China Standard Time)

It does mean you probably can't take advantage of certain optimizations (e.g. SIMD), though.
I am surprised about this...

This statement doesn't (directly) have anything to do with keyword arguments - apologies if I was unclear. I was saying that certain optimizations aren't going to be available to you if the chunk size is purely dynamic, i.e. isn't known at compile time. SIMD on dual numbers is a good example, because AFAIK, the tuple length has to be statically inferable for the SLP vectorizer to work.

My earlier problem with keyword arguments is that arguments passed in via keyword arguments are inferred poorly. For example:

julia> f{N}(; x::Type{Val{N}} = Val{3}) = Val{N}()
f (generic function with 1 method)

# yes, this is fine because the default 
# value for `x` is well inferred
julia> @code_warntype f()
Variables:
  #self#::#f

Body:
  begin
      return $(QuoteNode(Val{3}()))
  end::Val{3}

# ...but as soon as you actually use
# the kwarg, you're poisoning the 
# well. Note this happens 
# regardless of whether
# or not f is inlined (on v0.6).
julia> @code_warntype f(x = Val{2})
Variables:
  #unused#::#kw##f
  #temp#@_2::Array{Any,1}
  ::#f
  #temp#@_4::Int64
  #temp#@_5::Int64
  #temp#@_6::Any
  #temp#@_7::Int64
  #temp#@_8::Bool
  x::Any

Body:
  begin
      NewvarNode(:(x::Any))
      #temp#@_8::Bool = true
      SSAValue(2) = (Base.arraylen)(#temp#@_2::Array{Any,1})::Int64
      SSAValue(3) = (Base.select_value)((Base.sle_int)(0,1)::Bool,(Base.ashr_int)(SSAValue(2),(Base.bitcast)(UInt64,1))::Int64,(Base.shl_int)(SSAValue(2),(Base.bitcast)(UInt64,(Base.neg_int)(1)::Int64))::Int64)::Int64
      SSAValue(4) = (Base.select_value)((Base.sle_int)(1,SSAValue(3))::Bool,SSAValue(3),(Base.sub_int)(1,1)::Int64)::Int64
      #temp#@_7::Int64 = 1
      7:
      unless (Base.not_int)((#temp#@_7::Int64 === (Base.add_int)(SSAValue(4),1)::Int64)::Bool)::Bool goto 23
      SSAValue(5) = #temp#@_7::Int64
      SSAValue(6) = (Base.add_int)(#temp#@_7::Int64,1)::Int64
      #temp#@_4::Int64 = SSAValue(5)
      #temp#@_7::Int64 = SSAValue(6)
      #temp#@_5::Int64 = (Base.sub_int)((Base.mul_int)(#temp#@_4::Int64,2)::Int64,1)::Int64
      #temp#@_6::Any = (Core.arrayref)(#temp#@_2::Array{Any,1},#temp#@_5::Int64)::Any
      unless (#temp#@_6::Any === :x)::Bool goto 19
      x::Any = (Core.arrayref)(#temp#@_2::Array{Any,1},(Base.add_int)(#temp#@_5::Int64,1)::Int64)::Any
      #temp#@_8::Bool = false
      goto 21
      19:
      (Base.throw)($(Expr(:new, :(Base.MethodError), :((Core.getfield)((Core.getfield)((Core.getfield)(#f,:name)::TypeName,:mt),:kwsorter)), :((Core.tuple)(#temp#@_2,)::Tuple{Array{Any,1},#f}), :($(QuoteNode(0xffffffffffffffff))))))::Union{}
      21:
      goto 7
      23:
      unless #temp#@_8::Bool goto 26
      x::Any = Val{3}
      26:
      return (Main.#f#1)(x::Any,::#f)::Any
  end::Any

Kristoffer Carlsson · Answer 9 · Thu Jan 26 2017 00:35:13 GMT+0800 (China Standard Time)

because AFAIK, the tuple length has to be statically inferable for the SLP vectorizer to work.

It has to be available at compilation time of the function. My point is that there is a function barrier between the keyword argument function and the function actually doing the work. For example, in the generated code you linked, all of that is just setting up the keyword argument sorting stuff until finally calling (Main.#f#1)(x::Any,::#f) which is where the actual work is done. At this point, julia will do dynamic dispatch to figure out the type of x and then compile an optimized version of that function that will be called. So you pay for the dynamic dispatch but not in the generated code of the function (as long as it is not inlined).

Note that I am not talking about type inference of the return value, only about the generated code of the function.

Kristoffer Carlsson · Answer 10 · Thu Jan 26 2017 15:41:02 GMT+0800 (China Standard Time)

This also describes what my understanding is: JuliaLang/julia#10443 (comment)

Jarrett Revels · Answer 11 · Fri Jan 27 2017 00:12:01 GMT+0800 (China Standard Time)

Ah, that makes sense. What's the right way to test whether or not the operations below the barrier are getting SIMD vectorized? For example:

julia> using ForwardDiff: Dual

julia> f(x::Dual) = x + x
f (generic function with 1 method)

julia> @code_llvm f(Dual(2,3,4,5))

define void @julia_f_65900(%Dual* noalias sret, %Dual*) #0 !dbg !5 {
top:
  %2 = bitcast %Dual* %1 to <4 x i64>*
  %3 = load <4 x i64>, <4 x i64>* %2, align 8
  %4 = shl <4 x i64> %3, <i64 1, i64 1, i64 1, i64 1>
  %5 = bitcast %Dual* %0 to <4 x i64>*
  store <4 x i64> %4, <4 x i64>* %5, align 8
  ret void
}

julia> f2(; x::Dual=Dual(1,1)) = x + x

# If my understanding is correct, this call 
# should dynamically dispatch to a compiled
# representation that is essentially equivalent
# to the compiled representation of 
#`f(::Dual{4,Int})`, which we confirmed
# above uses SIMD.
#
# How do we confirm this? Just using 
# `@code_llvm` etc. will only show
# the dispatch instructions, yes? 
julia> f2(x=Dual(1,2,3,4))

Of course, a few points still stand; for keyword arguments to be considered "performant", the work being done below the function barrier must be expensive enough that the cost of dynamic dispatch is negligible, and the caller's return type can't depend on the keyword argument if we want it to be inferable.

Kristoffer Carlsson · Answer 12 · Fri Jan 27 2017 00:19:45 GMT+0800 (China Standard Time)

Of course, a few points still stand; for keyword arguments to be considered "performant", the work being done below the function barrier must be expensive enough that the cost of dynamic dispatch is negligible, and the caller's return type can't depend on the keyword argument if we want it to be inferable.

Agree on all points.

I'm not sure how to "look into" the inner function though...

Patrick Kofod Mogensen · Answer 13 · Sat Jan 27 2018 21:59:32 GMT+0800 (China Standard Time)

This is now implemented, see the autodiff.jl file https://github.com/JuliaNLSolvers/NLsolve.jl/blob/deed60a6bb655fb73e7b29a4c3759c696fa8827d/src/objectives/autodiff.jl