magic-lantern / SoftwareEngineeringPrinciples

Software Engineering Principles applied to computational research - paper and associated code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SoftwareEngineeringPrinciples

Russell S, Bennett TD, Ghosh D. 2019. Software engineering principles to improve quality and performance of R software. PeerJ Computer Science 5:e175 https://doi.org/10.7717/peerj-cs.175

Paper and associated code are contained in this repository.

Analysis code is contained in 2 R Markdown files, with shared functions stored in shared_fn.R. To run the analysis, all R packages on CRAN need to be downloaded. While it is possible to use services like "R Package Documentation" at https://rdrr.io, I found that downloading the packages locally is more reliable. Depending on internet speed, this can take many hours. Total size of all downloaded packages is currently about 6.6GB. If manual inspection is desired, a flag can be set so the process automatically un-tar and un-gzips the files, which takes about 20+ GB.

R Markdown documents and shared_fn.R do have some hard coded paths that assume you check out this Github repository to your home directory. If you've checked it out somewhere else, update the paths before running. See:

Once all files are downloaded, analysis process runs in just a minute or two on my test machine. YMMV as performance is highly disk dependent.

In order to improve performance on machines with high-latency internet connections, there is a parallel_processing flag (defaults to TRUE) that significantly improves performance. Parallel process does also improve performance on machines with low-latency connections, though not as much. Depending on other system configuration (disk speed, # of cpus) your mileage may vary.

Info on results of running code

As the process does require downloading over 13,000 files, I have saved the results of running in the form plain text output and images - for that, see the output/ directory. For the results of running the RMarkdown Notebooks, see the sourcecode/ directory.

About

Software Engineering Principles applied to computational research - paper and associated code

License:Apache License 2.0


Languages

Language:HTML 96.1%Language:TeX 2.3%Language:R 1.6%