DynMultiply

Introduction

This package includes a simple algorithm that selects the best multiplication order of matrices depending on their shape.

Usage

using DynMultiply, Benchmarks
A = rand(1000, 100)
B = rand(100, 10)
C = rand(10, 10000)
D = rand(10000, 100)

# Standard multiplication
M1 = A * B * C * D;

# Dynamic method
M2 = dynmultiply(A, B, C, D);

println("Relative error: ", maximum(abs(M2 - M1))/maximum(abs(M1)))
println("Multiplication order: ", dynmultiplystr(A, B, C, D))

Output:

Relative error: 9.402683663711557e-16
Multiplication order: ((A1*A2)*(A3*A4))

Benchmarking the two methods, we get

test_standard(Ms...) = *(Ms...)
test_dynamic(Ms...) = dynmultiply(Ms...)
julia> @benchmark test_standard(A,B,C,D)
================ Benchmark Results ========================
     Time per evaluation: 99.43 ms [96.03 ms, 102.83 ms]
Proportion of time in GC: 1.10% [1.04%, 1.16%]
        Memory allocated: 77.13 mb
   Number of allocations: 9 allocations
       Number of samples: 90
   Number of evaluations: 90
 Time spent benchmarking: 9.36 s


julia> @benchmark test_dynamic(A,B,C,D)
================ Benchmark Results ========================
     Time per evaluation: 4.91 ms [4.34 ms, 5.48 ms]
Proportion of time in GC: 0.35% [0.00%, 1.57%]
        Memory allocated: 867.98 kb
   Number of allocations: 13 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 0.78 s

With an example where all dimensions are the same, we see that dynmultiply is not cursed by overhead.

E = rand(100, 100)
F = rand(100, 100)
G = rand(100, 100)
H = rand(100, 100)

julia> @benchmark test_standard(E, F, G, H)
================ Benchmark Results ========================
     Time per evaluation: 247.50 μs [186.76 μs, 308.23 μs]
Proportion of time in GC: 0.80% [0.00%, 5.59%]
        Memory allocated: 234.70 kb
   Number of allocations: 9 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 0.29 s

julia> @benchmark test_dynamic(E, F, G, H)
================ Benchmark Results ========================
     Time per evaluation: 239.13 μs [172.40 μs, 305.86 μs]
Proportion of time in GC: 0.79% [0.00%, 5.50%]
        Memory allocated: 235.13 kb
   Number of allocations: 14 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 0.29 s

LMescheder / DynMultiply.jl

DynMultiply

Introduction

Usage

About

Languages