tidymodels parsnip cross-validation api classification churn-analytics machine-learning r random-forest logistic-regression

Modelling with Tidymodels and Parsnip

A Tidy Approach to a Classification Problem

22 June 2019

Recently I have completed the Business Analysis With R online course focused on applied data and business science with R, which introduced me to a couple of new modelling concepts and approaches. One that especially captured my attention is parsnip and its attempt to implement a unified modelling and analysis interface (similar to python's scikit-learn) to seamlessly access several modelling platforms in R.

parsnip is the brainchild of RStudio's Max Khun (of caret fame) and Davis Vaughan and forms part of tidymodels, a growing ensemble of tools to explore and iterate modelling tasks that shares a common philosophy (and a few libraries) with the tidyverse.

Although there are a number of packages at different stages in their development, I have decided to take tidymodels "for a spin", so to speak, and create and execute a "tidy" modelling workflow to tackle a classification problem. My aim is to show how easy it is to fit a simple logistic regression in R's glm and quickly switch to a cross-validated random forest using the ranger engine by changing only a few lines of code.

For this post in particular I'm focusing on four different libraries from the tidymodels suite: rsample for data sampling and cross-validation, recipes for data preprocessing, parsnip for model set up and estimation, and yardstick for model assessment.

Links

You can find the final article on my website

I've also published the article on Towards Data Science

About

Modelling with Tidymodels and Parsnip - A Tidy Approach to a Classification Problem

tidymodels parsnip cross-validation api classification churn-analytics machine-learning r random-forest logistic-regression

Languages

Language:HTML 99.6%Language:R 0.4%