Parallel Funcalc Benchmarks

Intro

We have benchmarked Funcalc on our benchmarking server. We have benchmarked Funcalc with converted spreadsheets from the EUSES corpus using the file benchmark.bat.

The date at which the benchmarks have been run can be found in the file build.log in each folder.

Meta-Statistics

How many arrays did the algorithm convert per sheet?

getLifts <- function (lpath) {
    errs <- list.files(path=lpath, pattern="err", full.names=TRUE)
    lifts <- t(sapply(errs, (function (err) {
        for (line in scan(err, what = character(), sep = "\n")) {
            if (grepl("Lifted", line)) {
                elems <- as.numeric(unlist(strsplit(line, " ")))
                return(c(sub(".err", "", basename(err)), elems[2], elems[4]))
            }
        }
    })))
}
lifts <- getLifts("euses/arrays")

02rise.xml	108	0
2000_places_School.xml	0	0
2002Qvols.xml	1280	0
2004_PUBLIC_BUGS_INVENTORY.xml	4493	0
Aggregate20Governanc#A8A51.xml	2528	0
EducAge25.xml	0	0
financial-model-spreadsheet.xml	0	0
Financial-Projections.xml	1836	0
funding.xml	471	0
high_2003_belg.xml	2829	99
iste-cs-2003-modeling-sim.xml	400	0
modeling-3.xml	0	0
MRP_Excel.xml	524	527
notes5CMISB200SP04H2KEY.xml	1500	0
ny_emit99.xml	0	0
Test20Station20Powe#A90F3.xml	1229	0
Time.xml	0	0
v1tmp.xml	0	0
WasteCalendarCalculat#A843B.xml	603	0

It seems that

cell arrays are in general smaller than we expected (< 64 cells); and
there are many cell arrays that would introduce cyclic dependencies when lifted.

Initial Data Probing

Let’s see how well we’re doing.

# file <- "MRP_Excel.xml"
file <- "Aggregate20Governanc#A8A51.xml"
readLog <- function (prefix, file) {
    return (read.table(paste(prefix, "/", file, ".out", sep=""),
                       dec=".",
                       row.names=3,
                       col.names=c("iteration", "mode", "elapsed"),
                       skip=2,
                       stringsAsFactors=TRUE))
}

# Turns elapsed milliseconds into doubles.
getElapsed <- function (vals) {
    as.double(sapply(vals[2], function (x) {
        return(sub(",", ".", sub("ms", "", x)))
    }))
}

getSpeedup <- function (experiments, baseline, file) {
  perf      <- readLog(experiments, file)
  base      <- readLog(baseline, file)
  base_mean <- mean(getElapsed(base))
  speedups  <- sapply(getElapsed(perf), function (x) { return (base_mean / x)})
  c(mean(speedups), sd(speedups))
}
cbind(c("Mean", "Stddev"), getSpeedup("euses/arrays", "euses/seq", file))

Mean	0.678750559823602
Stddev	0.0369747742062879

array <- getElapsed(readLog("euses/arrays", file))
plot(array)

hist(array, freq=0.1)

mean(getElapsed(readLog("euses/seq", "Financial-Projections.xml")))

4.684867

Overall Analysis

Let’s just focus on those sheets actually have lifted cell arrays:

filterSuccessfulLifts <- function (lifts) {
  successful <- lifts[as.numeric(lifts[,2]) + as.numeric(lifts[,3]) > 0, 1:3]
  return(successful[sort.list(successful[,1]),])
}
successful <- filterSuccessfulLifts(lifts)

02rise.xml	108	0
2002Qvols.xml	1280	0
2004_PUBLIC_BUGS_INVENTORY.xml	4493	0
Aggregate20Governanc#A8A51.xml	2528	0
Financial-Projections.xml	1836	0
funding.xml	471	0
high_2003_belg.xml	2829	99
iste-cs-2003-modeling-sim.xml	400	0
MRP_Excel.xml	524	527
notes5CMISB200SP04H2KEY.xml	1500	0
Test20Station20Powe#A90F3.xml	1229	0
WasteCalendarCalculat#A843B.xml	603	0

computeSpeedups <- function (benchmark, baseline) {
  files <- list.files(benchmark, pattern="out")
  speedups <- t(sapply(files,
                       function (file) {
                           f <- gsub(".out", "", file)
                           s <- getSpeedup(benchmark, baseline, f)
                           return(rbind(f, s[1], s[2]))
                       }))
  speedups.row.names <- files
  return(speedups)
}

speedups <- computeSpeedups("euses/arrays", "euses/seq")
speedupsF <- subset(speedups, speedups[,1] %in% successful)

02rise.xml	1.29736282169393	0.0111646533392518
2002Qvols.xml	1.00485802863102	0.0551914400974847
2004_PUBLIC_BUGS_INVENTORY.xml	2.26636076468086	0.038273942036477
Aggregate20Governanc#A8A51.xml	0.678750559823602	0.0369747742062879
Financial-Projections.xml	0.665963892870879	0.170498553241524
funding.xml	0.927404742297863	0.0126249406063556
high_2003_belg.xml	0.993168877525272	0.00702673279166305
iste-cs-2003-modeling-sim.xml	1.07109807058989	0.0222093270795209
MRP_Excel.xml	1.04868287181988	0.00789956041152768
notes5CMISB200SP04H2KEY.xml	0.908493534641156	0.0303644427416775
Test20Station20Powe#A90F3.xml	1.13931893816619	0.0372818552054307
WasteCalendarCalculat#A843B.xml	0.958700792238062	0.111673990606713

plot.bar <- function (cols, col) {
    ts <- t(matrix(cols[,col]))
    ts.names <- cols[,1]
    return(barplot(ts))
}
plot.bar(speedupsF, 2)

Synthetic Benchmarks

computeSpeedups("examples/arrays", "examples/seq")

finance2.xml	1.7455890057058	0.0843146578163405
finance.xml	2.29626631288287	0.134346665415993
testsdf.xml	2.29954388133998	0.0665480544807438

plot.bar(computeSpeedups("examples/arrays", "examples/seq"), 2)

I changed the number of benchmarks to run in testsdf.xml to 100. Clearly, our large or computationally heavy sheets gain much more from cell array lifting than the real-life sheets.

Also now for Filby’s sheets:

computeSpeedups("filby/arrays", "filby/seq")

DNA.xml	0.807176156370794	0.0710962266408015
EUSE.xml	1.33580283542263	0.092334646020717
PLANCK.xml	2.33002200568256	0.362830128081373

Let’s compare them with the speedup achieved via per-cell parallelism:

rbind(computeSpeedups("examples/cells", "examples/seq"),
      computeSpeedups("filby/cells", "filby/seq"))

finance2.xml	0.349169601377365	0.033142860048034
finance.xml	0.143679949828006	0.0221177314455975
testsdf.xml	0.170397986407867	0.0255833552621607
DNA.xml	0.446559153011431	0.0134697219740229
EUSE.xml	0.170836984769614	0.0194000274627591
PLANCK.xml	0.586276036553474	0.046148612962701

This should clearly show that our approach is useful!

plot.bar(computeSpeedups("filby/arrays", "filby/seq"), 2)

How many formula cells per sheet?

countFormulas <- function (file) {
    formulas <- sum(sapply(scan(file, what=character()),
                           function (line) { return(grepl("Formula", line)) }))
  return(c(basename(file), as.numeric(formulas)))
}
formulas <- t(sapply(list.files("~/Documents/funcalc-euses/",
                                recursive=TRUE, pattern="xml$",
                                full.names=TRUE),
                     countFormulas))

2004_PUBLIC_BUGS_INVENTORY.xml	4495
Aggregate20Governanc#A8A51.xml	3546
high_2003_belg.xml	12861
02rise.xml	10316
financial-model-spreadsheet.xml	3115
Financial-Projections.xml	3649
2000_places_School.xml	1375
2002Qvols.xml	2184
EducAge25.xml	1470
notes5CMISB200SP04H2KEY.xml	1557
Test20Station20Powe#A90F3.xml	2164
v1tmp.xml	1129
MRP_Excel.xml	4809
ny_emit99.xml	4353
Time.xml	4198
WasteCalendarCalculat#A843B.xml	844
funding.xml	1636
iste-cs-2003-modeling-sim.xml	1991
modeling-3.xml	213

We compute the theoretical maximum speedup by using Amdahl’s law:

amdahl <- function (pWork, nThreads) {
    return(1 / (1 - pWork + pWork / nThreads))
}
max.speedup <- function (formulas, arrayCells) {
    return(amdahl(arrayCells / formulas, 32))
}

Let’s assume a sheet of 3000 formulas of which 400 are in parallelizable cell arrays:

max.speedup(3000, 400)

1.14832535885167

This is actually not too far from what we achieve on average, also counting sheets that are not converted:

speedups <- computeSpeedups("euses/arrays", "euses/seq")
mean(as.numeric(speedups[,2]))

1.06837660717346

Keep in mind that the estimate is overly optimistic! There are potential sequential dependencies between the cell arrays, which our theoretical bound does not take into account.

How well are we doing?

There seems to be something wrong with the formula count; how can the number of lifted cell array cells ever be larger than the number of overall formulas? Turns out I just don’t know R and data must be sorted alphabetically by file name.

fc0 <- formulas[sort.list(formulas[,1]),]
fc <- subset(fc0, fc0[,1] %in% successful)
ratios <- cbind(fc, as.numeric(successful[,2]) + as.numeric(successful[,3]))

02rise.xml	10316	108
2002Qvols.xml	2184	1280
2004_PUBLIC_BUGS_INVENTORY.xml	4495	4493
Aggregate20Governanc#A8A51.xml	3546	2528
Financial-Projections.xml	3649	1836
funding.xml	1636	471
high_2003_belg.xml	12861	2928
iste-cs-2003-modeling-sim.xml	1991	400
MRP_Excel.xml	4809	1051
notes5CMISB200SP04H2KEY.xml	1557	1500
Test20Station20Powe#A90F3.xml	2164	1229
WasteCalendarCalculat#A843B.xml	844	603

Now, we can compute the hypothetical bound.

bounds <- cbind(ratios[,1], max.speedup(as.numeric(ratios[,2]), as.numeric(ratios[,3])))

02rise.xml	1.01024592672387
2002Qvols.xml	2.3135593220339
2004_PUBLIC_BUGS_INVENTORY.xml	31.5646258503401
Aggregate20Governanc#A8A51.xml	3.23245214220602
Financial-Projections.xml	1.95094566597607
funding.xml	1.38677121135864
high_2003_belg.xml	1.28295675594793
iste-cs-2003-modeling-sim.xml	1.24165887121921
MRP_Excel.xml	1.26858301664372
notes5CMISB200SP04H2KEY.xml	14.9891696750903
Test20Station20Powe#A90F3.xml	2.22312112748403
WasteCalendarCalculat#A843B.xml	3.24810583283223

How far are we from reaching the overly optimistic, hypothetical bound? We compute the difference between hypothetical bound and actual speedup, divided by the bound:

δ_speedup = (bound - speedup) / bound

cbind(s0[,1], (as.numeric(bounds[,2]) - as.numeric(speedupsF[,2])) / as.numeric(bounds[,2]))

02rise.xml	-0.284204951858754
2002Qvols.xml	0.565665760518461
2004_PUBLIC_BUGS_INVENTORY.xml	0.928199346463774
Aggregate20Governanc#A8A51.xml	0.790019919874086
Financial-Projections.xml	0.658645597114724
funding.xml	0.331248922171328
high_2003_belg.xml	0.225875016503221
iste-cs-2003-modeling-sim.xml	0.137365265599756
MRP_Excel.xml	0.173343125312861
notes5CMISB200SP04H2KEY.xml	0.939390002626301
Test20Station20Powe#A90F3.xml	0.487513782276187
WasteCalendarCalculat#A843B.xml	0.704843117318591

Negative results probably mean that we exceed the hypothetical bound, which is good but weird.

For synthetic sheets:

synthS <- computeSpeedups("examples/arrays", "examples/seq")
synthF <- cbind(countFormulas("~/src/funcalc-examples/applied/finance2.xml"),
                countFormulas("~/src/funcalc-examples/applied/finance.xml"),
                countFormulas("~/src/funcalc-examples/tests/testsdf.xml"))

finance2.xml	finance.xml	testsdf.xml
106987	15943	3774

synthL <- getLifts("examples/arrays")
synthH <- max.speedup(as.numeric(synthF[2,]), as.numeric(synthL[,2]) + as.numeric(synthL[,3]))
cbind(synthS[,1], (as.numeric(synthH) - as.numeric(synthS[,2])) / as.numeric(synthH))

finance2.xml	0.924886703190993
finance.xml	0.849128836307796
testsdf.xml	-1.01857483727986

Again, negative results. I think the approach is flawed since the measure of the possible parallel work is very inaccurate. The idea would be more useful if we can find a better way to approximate parallel work. Unless we can do that, we cannot use it.

Analysis for synthetic sheets:

computeSpeedups("synth/arrays", "synth/seq")

synth-map.xml	3.13738659618067	0.0596575681287037
synth-prefix.xml	10.3001083761698	0.655205245937034

fbie / funcalc-array-benchmarks