targets
R package Keras model example
The goal of this workflow is find the Keras model that best predicts customer attrition (“churn”) on a subset of the IBM Watson Telco Customer Churn dataset. (See this RStudio Blog post by Matt Dancho for a thorough walkthrough of the use case.) Here fit multiple Keras models to the dataset with different tuning parameters, pick the one with the highest classification test accuracy, and produce a trained model for the best set of tuning parameters we find.
targets
pipeline
The The targets
R package manages
the workflow. It automatically skips steps of the pipeline when the
results are already up to date, which is critical for machine learning
tasks that take a long time to run. It also helps users understand and
communicate this work with tools like the interactive dependency graph
below.
library(targets)
tar_visnetwork()
File structure
The files in this example are organized as follows.
├── run.sh
├── run.R
├── _targets.R
├── sge.tmpl
├── R/
├──── functions.R
├── data/
├──── customer_churn.csv
└── report.Rmd
File | Purpose |
---|---|
run.sh |
Shell script to run run.R in a persistent background process. Works on Unix-like systems. Helpful for long computations on servers. |
run.R |
R script to run tar_make() or tar_make_clustermq() (uncomment the function of your choice.) |
_targets.R |
The special R script that declares the targets pipeline. See tar_script() for details. |
sge.tmpl |
A clustermq template file to deploy targets in parallel to a Sun Grid Engine cluster. |
R/functions.R |
An R script with user-defined functions. Unlike _targets.R , there is nothing special about the name or location of this script. In fact, for larger projects, it is good practice to partition functions into multiple files. |
data/customer_churn.csv |
A subset of the IBM Watson Telco Customer Churn dataset |
report.Rmd |
An R Markdown report summarizing the results of the analysis. For more information on how to include R Markdown reports as reproducible components of the pipeline, see the tar_render() function from the tarchetypes package and the literate programming chapter of the manual. |
How to access
You can try out the example as long as you have a browser and an internet connection. Click here to navigate your browser to an RStudio Cloud instance. No downloads or installations required.
How to run
- If you are running locally instead of this RStudio cloud
workspace,
- Download the files in this repository, either through Git or through this link.
- Run the setup
script
to install the required R and Python packages. Then, run
tensorflow::tf_config()
to verify that TensorFlow installed correctly.
- Run the
targets
pipeline by either runningrun.R
orrun.sh
. (The latter is for Unix-like systems only). This computation could take a while. - View the results in the output
report.html
file. - Make changes to the R code or data, rerun the pipeline, and watch
targets
skip steps that are already up to date.
High-performance computing
You can run this project locally on your laptop or remotely on a
cluster. You have several choices, and they each require modifications
to run.R
and
_targets.R
.
Mode | When to use | Instructions for run.R |
Instructions for _targets.R |
---|---|---|---|
Sequential | Low-spec local machine or Windows. | Uncomment tar_make() |
No action required. |
Local multicore | Local machine with a Unix-like OS. | Uncomment tar_make_clustermq() |
Uncomment options(clustermq.scheduler = "multicore") |
Sun Grid Engine | Sun Grid Engine cluster. | Uncomment tar_make_clustermq() |
Uncomment options(clustermq.scheduler = "sge", clustermq.template = "sge.tmpl") |