Documentation on GPU support
Sbozzolo opened this issue · comments
I tried to use Interpolations.jl
with CUDA and found myself deeply lost in the documentation and didn't know what to expect from the package. I found about GPU support from GitHub issues (and PR #504). But that's pretty much all the information available. All the documentation I could about GPU support is a short section in the "Developer documentation".
As I user, I would like to use interpolants from interpolations.jl in my CUDA kernels. The naive attempt of not doing anything special leads to functions that do not compile
.T is of type Interpolations.ScaledInterpolation{Float64, 1, Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{Base.OneTo{Int64}}}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}} which is not isbits.
.itp is of type Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{Base.OneTo{Int64}}} which is not isbits.
.coefs is of type Vector{Float64} which is not isbits.
.u is of type Interpolations.ScaledInterpolation{Float64, 1, Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{Base.OneTo{Int64}}}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}} which is not isbits.
.itp is of type Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{Base.OneTo{Int64}}} which is not isbits.
.coefs is of type Vector{Float64} which is not isbits.
.q is of type Interpolations.ScaledInterpolation{Float64, 1, Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{Base.OneTo{Int64}}}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}} which is not isbits.
.itp is of type Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{Base.OneTo{Int64}}} which is not isbits.
.coefs is of type Vector{Float64} which is not isbits.
.P is of type Interpolations.ScaledInterpolation{Float64, 1, Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{Base.OneTo{Int64}}}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}} which is not isbits.
.itp is of type Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{Base.OneTo{Int64}}} which is not isbits.
.coefs is of type Vector{Float64} which is not isbits.
.c_co2 is of type Interpolations.ScaledInterpolation{Float64, 1, Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{Base.OneTo{Int64}}}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}} which is not isbits.
.itp is of type Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, Tuple{Base.OneTo{Int64}}} which is not isbits.
.coefs is of type Vector{Float64} which is not isbits.
I tried a bunch of things that didn't work, like changing constructors, or passing CuArray
s to them.
Following the developer documentation, I managed to have a working function using adapt
(which I found a little surprising, since I was expecting adapt
to be only needed on the Interpolations.jl
-side).
Some of the functions (e.g., adapt(CuArray{Float64}, itp)
error out on printing (or, more specifically, they to scalar indexing on GPUs).
cuitp
doesn't work on Vector
s , or on scalars:
cuitp.(1:0.5:2 |> collect) # .x is of type Vector{Float64} which is not isbits.
cuitp.(1:0.5:2 |> collect |> CuArray) # This is fine
cuitp(1) # Scalar indexing
cuitp.(Ref(1)) # This is fine
In this, I also found unclear if the higher-level constructors supported GPUs or not.
It would be very useful to clearly specify what does it mean for Interpolations.jl
to support GPUs.
I can try, but perhaps @N5N3 would like to contribute.
I tried a bunch of things that didn't work, like changing constructors, or passing CuArrays to them.
A quick reply is that the first stage of interpolating, named prefiltering, is not GPU compatible.
So you must construct the interpolant object on CPU side, then move the object to GPU side.
(I think the MWE clearly shows that? perhaps we need to emphasize it.)
Once you finish that adapt
, cuitp
supports kernal programming.
Just imaging you are coding a kernel for CuArray
spreading, i.e. B[idx] = A[I[idx]]
where A
B
and I
are all CuArray
, and you can replace A[...]
with cuitp(...)
here.
cuitp doesn't work on Vectors
IIRC, there's no auto Array
/GPUArray
mixed broadcasting in julia, so cuitp.(1:0.5:2 |> collect)
is expected to fail.
The solution might be adding a transforming layer before copyto!(::CuArray, ::Broadcasted)
but that should be added in GPUArrays.jl
, not here.
Some of the functions (e.g., adapt(CuArray{Float64}, itp) error out on printing (or, more specifically, they to scalar indexing on GPUs).
For Scalar indexing
warning, it happens mainly in displaying, so it should be OK.
I think it's better to let user judging if @allowscalar
makes sense.
I tried a bunch of things that didn't work, like changing constructors, or passing CuArrays to them.
A quick reply is that the first stage of interpolating, named prefiltering, is not GPU compatible. So you must construct the interpolant object on CPU side, then move the object to GPU side. (I think the MWE clearly shows that? perhaps we need to emphasize it.) Once you finish that
adapt
,cuitp
supports kernal programming. Just imaging you are coding a kernel forCuArray
spreading, i.e.B[idx] = A[I[idx]]
whereA
B
andI
are allCuArray
, and you can replaceA[...]
withcuitp(...)
here.cuitp doesn't work on Vectors
IIRC, there's no auto
Array
/GPUArray
mixed broadcasting in julia, socuitp.(1:0.5:2 |> collect)
is expected to fail. The solution might be adding a transforming layer beforecopyto!(::CuArray, ::Broadcasted)
but that should be added inGPUArrays.jl
, not here.Some of the functions (e.g., adapt(CuArray{Float64}, itp) error out on printing (or, more specifically, they to scalar indexing on GPUs).
For
Scalar indexing
warning, it happens mainly in displaying, so it should be OK. I think it's better to let user judging if@allowscalar
makes sense.
Thank you!
I just wish for all of this to be clearly presented and documented.
(I think the MWE clearly shows that? perhaps we need to emphasize it.)
The MWE is in a section that claims to be targeted toward developers (and I am assuming developers of Interpolations.jl
), and offers no explanations. It allowed me to get a working interpolant, but shined no light on what was going on/what is supported/how I am supposed to use the package.
Also, given that all one needs to obtain a cuitp
at the end of the day is apply adapt
, would it make sense to provide a package extension for CUDA that does that automatically in the constructors for the interpolators?
The constructor might just dispatch over the input array and construct something GPU-compatible when given a CuArray
:
function interpolator(y::CuArray)
y_cpu = Array(y)
return Adapt.adapt(CuArray{eltype(y)}, interpolator(y_cpu))
end
This will also allow downstream packages to use Interpolations.jl
without directly depending on Adapt
.
I think doc could be improved by moving it to a new section and adding more usage information.
As for adapt
, I think it should be to left as a soft blocker to users who want to create itp
from a CuArray
.
It's definitely inefficient as we would have to transfer the data twice, which should be avoided whenever possible.
Is there a way to have cuitp(t)
to work instead of cuitp.(Ref(t))
?
My use case is this. I have time series that I use as boundary conditions for evolving a system forward in time. More specifically, I have several functions that evaluate 1D splines at the given t
. I would like to do this on a GPU.
I have several calls like bc_var = spline_var(t)
, but this doesn't work on a GPU because of scalar indexing. I would prefer to not change the code to have "fake" broadcasted expressions just to compute a collection of various scalars at each time step.
Theoretically, the Ref
could be removed. cuitp.(1)
should work as expected.
But these kind of usage would be inefficient anyway. Even we support cuitp(1)
outside gpu kernel, the scalar result still needs to be transfered back to CPU through a GPUArray wrapper thus there's no difference compared with the broadcast solution. And more importantly, the calculation could not be parallelized and it should be always slower than itp(1)
.
If you have many 1d splines to interpolate at each time step. A possible solution is combine them into a 2d interpolation and mark the 2nd dim as NoInterp
and you can get the result by cuitp.(1, 1:10)