MikaelSlevinsky / FastTransforms

:bullettrain_front: Fast orthogonal polynomial transforms :surfer:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Change the vectorization paradigm

MikaelSlevinsky opened this issue · comments

Single step decrements in Proriol/Zernike/spherical harmonic polynomial orders are currently done with a sweep of Givens rotations. Nominally, they come from the QR factorization of W = I±X or I-X^2. Multi-step decrements can be done with a Householder QR factorization of W^k for some integer k (c.f. https://arxiv.org/pdf/2302.08448.pdf with @dlfivefifty and @TSGut). It's reasonable then that k would be chosen such that the Householder vectors fit exactly into SIMD registers or small multiples of these to maximize throughput.

Of course, a quick check shows that the QR factorization of W^k is nowhere near the result of taking the product of sequential Qs. This is basically the comment of Golub and Kautsky

julia> using ApproxFun, LinearAlgebra

julia> n = 64
64

julia> K = 8
8

julia> # Best
       Q = Vector{Matrix{Float64}}(undef, K);

julia> for k in 1:K
           x = Fun(x->x, NormalizedJacobi(0,2k-2))
           X = Multiplication(x, space(x))
           W = I-X
           F = qr(W[1:n+1-k, 1:n-k])
           Q[k] = Matrix(F.Q)*Diagonal(sign.(F.R))
       end

julia> F = qr(prod(Q));

julia> pQ = Matrix(F.Q)*Diagonal(sign.(F.R));

julia> # Worst
       x = Fun(x->x, NormalizedJacobi(0,0));

julia> X = Multiplication(x, space(x));

julia> W = I-X;

julia> F = qr((W^K)[1:n,1:n-K]);

julia> QK = Matrix(F.Q)*Diagonal(sign.(F.R));

julia> norm(pQ - prod(Q))/norm(prod(Q))
6.347268321085116e-16

julia> norm(QK - prod(Q))/norm(prod(Q))
0.003283515270996512

julia> norm(pQ - QK)/norm(prod(Q))
0.00328351527099649

I think it's still possible to convert multiple sweeps of Givens rotations into a single pass of Householder transformations without recreating the dense orthogonal matrices: this is the Householder analogue of a turnover operation (converting a particular set of 3 Givens rotations into 3 different Givens rotations https://arxiv.org/pdf/1611.02435.pdf)