Methods and open-source toolkit for analyzing and visualizing challenge results
Note that this is an early experimental version (version 0.1.3) and there may still be (severe) bugs. There may be updates with possibly major changes. Please make sure that you use the most current version!
Installation
Requires R version >= 3.5.2 (https://www.r-project.org).
Further, a recent version of Pandoc (>= 1.12.3) is required. RStudio (https://rstudio.com) automatically includes this so you do not need to download Pandoc if you plan to use rmarkdown from the RStudio IDE, otherwise you’ll need to install Pandoc for your platform (https://pandoc.org/installing.html). Finally, if you want to generate a pdf report you will need to have LaTeX installed (e.g. MiKTeX, MacTeX or TinyTeX).
To get the current development (experimental) version of the R package from Github:
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("Rgraphviz", dependencies = TRUE)
devtools::install_github("wiesenfa/challengeR", dependencies = TRUE)
If you are asked whether you want to update installed packages and you type “a” for all, you might need administrator rights to update R core packages. You can also try to type “n” for updating no packages. If you are asked “Do you want to install from sources the packages which need compilation? (Yes/no/cancel)”, you can safely type “no”.
If you get Warning messages (in contrast to Error messages), these might not be problematic and you can try to proceed.
Terms of use
Licenced under GPL-3. If you use this software for a publication, cite
Wiesenfarth, M., Reinke, A., Landmann A.L., Cardoso, M.J., Maier-Hein, L. and Kopp-Schneider, A. (2019). Methods and open-source toolkit for analyzing and visualizing challenge results. arXiv preprint arXiv:1910.05121
Usage
Each of the following steps have to be run to generate the report: (1) Load package, (2) load data, (3) perform ranking, (4) perform bootstrapping and (5) generation of the report
1. Load package
Load package
library(challengeR)
2. Load data
Data requirements
Data requires the following columns
- a task identifier in case of multi-task challenges.
- a test case identifier
- the algorithm name
- the metric value
In case of missing metric values, a missing observation has to be provided (either as blank field or “NA”).
For example, in a challenge with 2 tasks, 2 test cases and 2 algorithms, where in task “T2”, test case “case2”, algorithm “A2” didn’t give a prediction (and thus NA or a blank field for missing value is inserted), the data set might look like this:
Task | TestCase | Algorithm | MetricValue |
---|---|---|---|
T1 | case1 | A1 | 0.617 |
T1 | case1 | A2 | 0.823 |
T1 | case2 | A1 | 0.601 |
T1 | case2 | A2 | 0.049 |
T2 | case1 | A1 | 0.557 |
T2 | case1 | A2 | 0.696 |
T2 | case2 | A1 | 0.383 |
T2 | case2 | A2 | NA |
Load data
If you have assessment data at hand stored in a csv file (if you want to use simulated data skip the following code line) use
data_matrix=read.csv(file.choose()) # type ?read.csv for help
This allows to choose a file interactively, otherwise replace file.choose() by the file path (in style “/path/to/dataset.csv”) in quotation marks.
For illustration purposes, in the following simulated data is generated instead (skip the following code chunk if you have already loaded data). The data is also stored as “data_matrix.csv” in the repository.
if (!requireNamespace("permute", quietly = TRUE)) install.packages("permute")
n=50
set.seed(4)
strip=runif(n,.9,1)
c_ideal=cbind(task="c_ideal",
rbind(
data.frame(alg_name="A1",value=runif(n,.9,1),case=1:n),
data.frame(alg_name="A2",value=runif(n,.8,.89),case=1:n),
data.frame(alg_name="A3",value=runif(n,.7,.79),case=1:n),
data.frame(alg_name="A4",value=runif(n,.6,.69),case=1:n),
data.frame(alg_name="A5",value=runif(n,.5,.59),case=1:n)
))
set.seed(1)
c_random=data.frame(task="c_random",
alg_name=factor(paste0("A",rep(1:5,each=n))),
value=plogis(rnorm(5*n,1.5,1)),case=rep(1:n,times=5)
)
strip2=seq(.8,1,length.out=5)
a=permute::allPerms(1:5)
c_worstcase=data.frame(task="c_worstcase",
alg_name=c(t(a)),
value=rep(strip2,nrow(a)),
case=rep(1:nrow(a),each=5)
)
c_worstcase=rbind(c_worstcase,
data.frame(task="c_worstcase",alg_name=1:5,value=strip2,case=max(c_worstcase$case)+1)
)
c_worstcase$alg_name=factor(c_worstcase$alg_name,labels=paste0("A",1:5))
data_matrix=rbind(c_ideal, c_random, c_worstcase)
3 Perform ranking
3.1 Define challenge object
Code differs slightly for single and multi task challenges.
In case of a single task challenge use
# Use only task "c_random" in object data_matrix
dataSubset=subset(data_matrix, task=="c_random")
challenge=as.challenge(dataSubset,
# Specify which column contains the algorithm,
# which column contains a test case identifier
# and which contains the metric value:
algorithm="alg_name", case="case", value="value",
# Specify if small metric values are better
smallBetter = FALSE)
Instead, for a multi-task challenge use
# Same as above but with 'by="task"' where variable "task" contains the task identifier
challenge=as.challenge(data_matrix,
by="task",
algorithm="alg_name", case="case", value="value",
smallBetter = FALSE)
3.2 Perform ranking
Different ranking methods are available, choose one of them:
- for “aggregate-then-rank” use (here: take mean for aggregation)
ranking=challenge%>%aggregateThenRank(FUN = mean, # aggregation function,
# e.g. mean, median, min, max,
# or e.g. function(x) quantile(x, probs=0.05)
na.treat=0, # either "na.rm" to remove missing data,
# set missings to numeric value (e.g. 0)
# or specify a function,
# e.g. function(x) min(x)
ties.method = "min" # a character string specifying
# how ties are treated, see ?base::rank
)
- alternatively, for “rank-then-aggregate” with arguments as above (here: take mean for aggregation):
ranking=challenge%>%rankThenAggregate(FUN = mean,
ties.method = "min"
)
- alternatively, for test-then-rank based on Wilcoxon signed rank test:
ranking=challenge%>%testThenRank(alpha=0.05, # significance level
p.adjust.method="none", # method for adjustment for
# multiple testing, see ?p.adjust
na.treat=0, # either "na.rm" to remove missing data,
# set missings to numeric value (e.g. 0)
# or specify a function, e.g. function(x) min(x)
ties.method = "min" # a character string specifying
# how ties are treated, see ?base::rank
)
4. Perform bootstrapping
Perform bootstrapping with 1000 bootstrap samples using one CPU
set.seed(1)
ranking_bootstrapped=ranking%>%bootstrap(nboot=1000)
If you want to use multiple CPUs (here: 8 CPUs), use
library(doParallel)
registerDoParallel(cores=8)
set.seed(1)
ranking_bootstrapped=ranking%>%bootstrap(nboot=1000, parallel=TRUE, progress = "none")
stopImplicitCluster()
5. Generate the report
Generate report in PDF, HTML or DOCX format. Code differs slightly for single and multi task challenges.
5.1 For single task challenges
report(ranking_bootstrapped,
title="singleTaskChallengeExample", # used for the title of the report
file = "filename",
format = "PDF", # format can be "PDF", "HTML" or "Word"
latex_engine="pdflatex", #LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex"
clean=TRUE #optional. Using TRUE will clean intermediate files that are created during rendering.
)
Argument file allows for specifying the output file path as well, otherwise the working directory is used. If file is specified but does not have a file extension, an extension will be automatically added according to the output format given in format. Using argument clean=FALSE allows to retain intermediate files, such as separate files for each figure.
If argument “file” is omitted, the report is created in a temporary folder with file name “report”.
5.1 For multi task challenges
Same as for single task challenges, but additionally consensus ranking (rank aggregation across tasks) has to be given.
Compute ranking consensus across tasks (here: consensus ranking according to mean ranks across tasks):
# See ?relation_consensus for different methods to derive consensus ranking
meanRanks=ranking%>%consensus(method = "euclidean")
meanRanks # note that there may be ties (i.e. some algorithms have identical mean rank)
Generate report as above, but with additional specification of consensus ranking
report(ranking_bootstrapped,
consensus=meanRanks,
title="multiTaskChallengeExample",
file = "filename",
format = "PDF", # format can be "PDF", "HTML" or "Word"
latex_engine="pdflatex"#LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex"
)
Reference
Wiesenfarth, M., Reinke, A., Landmann A.L., Cardoso, M.J., Maier-Hein, L. and Kopp-Schneider, A. (2019). Methods and open-source toolkit for analyzing and visualizing challenge results. arXiv preprint arXiv:1910.05121