matrix-free example

Question

matrix-free example

francispoulin opened this issue 3 years ago · comments

Can someone help me find an example of how to compute eigenvalues in this repo using a matrix-free approach?

Felix Gerick · Answer 1 · Thu Apr 08 2021 17:52:11 GMT+0800 (China Standard Time)

You can use a linear operator for your matrix free function f! using LinearMaps:

using ArnoldiMethod, LinearMaps

n = 100
A = sprandn(n,n,0.01)

f!(y,x) = mul!(y,A,x)
LO = LinearMap{ComplexF64}(f!,n)

evals, evecs = partialeigen(partialschur(LO)[1])

of course this example is not really matrix free ;)

Francis J. Poulin · Answer 2 · Thu Apr 08 2021 19:02:37 GMT+0800 (China Standard Time)

Thank you @fgerick , that is most helpful!

Even though you used a (sparse) matrix to build the function, I see how this would work in a purely matrix free world.

Francis J. Poulin · Answer 3 · Mon Apr 12 2021 21:16:28 GMT+0800 (China Standard Time)

A small comment, for anyone who might find this. For the above the work, we also need the libraries LinearAlgebra and SparseArrays.

using ArnoldiMethod, LinearMaps, SparseArrays, LinearAlgebra

Francis J. Poulin · Answer 4 · Mon Apr 12 2021 21:29:40 GMT+0800 (China Standard Time)

Given that this library works for matrix-free methods, I tried using this on CuArray's and it seems to work. I have copied and pasted code from here and there so it's not clean but it seems to work.

Is this the right appraoch to follow if you wanted to compute eignevalues using CUDA?

using ArnoldiMethod, LinearMaps, SparseArrays, LinearAlgebra
using CUDA, CUDA.CUSPARSE

n = 100

# on a cpu
A = sprandn(n,n,0.01)
f!(y, x) = mul!(y, A, x)
L0 = LinearMap{ComplexF64}(f!,n)
evals, evecs = partialeigen(partialschur(L0)[1])

# on a gpu
d_A = CuSparseMatrixCSR(A)
d_f!(y, x) = mul!(y, d_A, x)
d_L0 = LinearMap{ComplexF64}(d_f!,n)
d_evals, d_evecs = partialeigen(partialschur(d_L0)[1])

When I look at the eigenvalues using the two approaches I get different values. But I suppose that isn't surprising since each method might have converged to different eigenvalues?

6-element Array{Complex{Float64},1}:
  -0.07185208030629864 - 0.0250014007069258im
  -0.04433060151158075 - 0.06504924608573556im
 0.0019488133679050222 - 0.07618870848442198im
  0.028375702661698914 + 0.06885547957974067im
   0.06108019161608514 + 0.041630746966274154im
    0.4139145911414931 - 2.5783009777068577e-17im

julia> d_evals
6-element Array{Complex{Float64},1}:
 -0.020742894494909982 + 0.0040183243837709935im
 -0.019833532873216757 - 0.007273515707240475im
  -0.01566864224337404 + 0.014025495541377017im
 -0.013879128103749916 - 0.016113670304331168im
   0.02015975640469038 + 0.005300578979954147im
    0.4139145911414922 + 8.167995097413704e-17im

Harmen Stoppels · Answer 5 · Tue Apr 13 2021 07:42:36 GMT+0800 (China Standard Time)

Hm, it works on the gpu? I forgot/missed that. Are you sure it's not converting to float32? You might also want to look at the (colwise) norm of AQ-QR (the partial schur decomp residual)

Francis J. Poulin · Answer 6 · Tue Apr 13 2021 11:35:04 GMT+0800 (China Standard Time)

Thank you @haampie for sharing your thoughts and good question. I thought it did run on a gpuas it spit out an answer, but I wasn't sure. To try and answer the question I decided to use `nvprof'. I fixed up what initially didn't work.

An important question is do you think that I can use this library on gpus or is that not possible?

nvprof --profile-from-start off julia --project

using ArnoldiMethod, LinearMaps, SparseArrays, LinearAlgebra
using CUDA

n = 100

# on a cpu
A = sprandn(n,n,0.01)
f!(y, x) = mul!(y, A, x)
L0 = LinearMap{ComplexF64}(f!,n)
evals, evecs = partialeigen(partialschur(L0)[1])

# on a gpu
d_A = CUDA.rand(n,n)
d_f!(y, x) = mul!(y, d_A, x)
d_L0 = LinearMap{ComplexF64}(d_f!,n)
d_evals, d_evecs = partialeigen(partialschur(d_L0)[1])

CUDA.@profile d_evals, d_evecs = partialeigen(partialschur(d_L0)[1])

exit()

This ran but I was told the operations are slow

┌ Warning: Calling CUDA.@profile only informs an external profiler to start.
│ The user is responsible for launching Julia under a CUDA profiler like `nvprof`.
│ 
│ For improved usability, launch Julia under the Nsight Systems profiler:
│ $ nsys launch julia
└ @ CUDA.Profile ~/.julia/packages/CUDA/wTQsK/lib/cudadrv/profile.jl:39

The profile gives the following that I'm still trying to understand. It seems to spend all the GPU time on copying from devide to host. That does not sound like it is actually doing any calculations, as you suspected.

==266285== Profiling application: julia --project
==266285== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  1.37606s   1601400     859ns     384ns  6.3360us  [CUDA memcpy DtoH]
      API calls:  100.00%  11.5956s   1601400  7.2400us  6.0700us  34.317ms  cuMemcpyDtoH
                    0.00%  1.4920us         1  1.4920us  1.4920us  1.4920us  cuDeviceGetCount