robsalasco / targets-keras

An example Keras pipeline with the targets R package

Home Page:https://rstudio.cloud/project/1430828/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

targets R package Keras model example

Launch RStudio Cloud

The goal of this workflow is find the Keras model that best predicts customer attrition (“churn”) on a subset of the IBM Watson Telco Customer Churn dataset. (See this RStudio Blog post by Matt Dancho for a thorough walkthrough of the use case.) Here fit multiple Keras models to the dataset with different tuning parameters, pick the one with the highest classification test accuracy, and produce a trained model for the best set of tuning parameters we find.

The targets pipeline

The targets R package manages the workflow. It automatically skips steps of the pipeline when the results are already up to date, which is critical for machine learning tasks that take a long time to run. It also helps users understand and communicate this work with tools like the interactive dependency graph below.

library(targets)
tar_visnetwork()

File structure

The files in this example are organized as follows.

├── run.sh
├── run.R
├── _targets.R
├── sge.tmpl
├── R/
├──── functions.R
├── data/
├──── customer_churn.csv
└── report.Rmd
File Purpose
run.sh Shell script to run run.R in a persistent background process. Works on Unix-like systems. Helpful for long computations on servers.
run.R R script to run tar_make() or tar_make_clustermq() (uncomment the function of your choice.)
_targets.R The special R script that declares the targets pipeline. See tar_script() for details.
sge.tmpl A clustermq template file to deploy targets in parallel to a Sun Grid Engine cluster.
R/functions.R An R script with user-defined functions. Unlike _targets.R, there is nothing special about the name or location of this script. In fact, for larger projects, it is good practice to partition functions into multiple files.
data/customer_churn.csv A subset of the IBM Watson Telco Customer Churn dataset
report.Rmd An R Markdown report summarizing the results of the analysis. For more information on how to include R Markdown reports as reproducible components of the pipeline, see the tar_render() function from the tarchetypes package and the literate programming chapter of the manual.

How to access

You can try out the example as long as you have a browser and an internet connection. Click here to navigate your browser to an RStudio Cloud instance. No downloads or installations required.

How to run

  1. If you are running locally instead of this RStudio cloud workspace,
    1. Download the files in this repository, either through Git or through this link.
    2. Run the setup script to install the required R and Python packages. Then, run tensorflow::tf_config() to verify that TensorFlow installed correctly.
  2. Run the targets pipeline by either running run.R or run.sh. (The latter is for Unix-like systems only). This computation could take a while.
  3. View the results in the output report.html file.
  4. Make changes to the R code or data, rerun the pipeline, and watch targets skip steps that are already up to date.

High-performance computing

You can run this project locally on your laptop or remotely on a cluster. You have several choices, and they each require modifications to run.R and _targets.R.

Mode When to use Instructions for run.R Instructions for _targets.R
Sequential Low-spec local machine or Windows. Uncomment tar_make() No action required.
Local multicore Local machine with a Unix-like OS. Uncomment tar_make_clustermq() Uncomment options(clustermq.scheduler = "multicore")
Sun Grid Engine Sun Grid Engine cluster. Uncomment tar_make_clustermq() Uncomment options(clustermq.scheduler = "sge", clustermq.template = "sge.tmpl")

About

An example Keras pipeline with the targets R package

https://rstudio.cloud/project/1430828/

License:Other


Languages

Language:R 99.5%Language:Shell 0.5%