satopan / useR2020-text-modeling-tutorial

Home Page:https://emilhvitfeldt.github.io/useR2020-text-modeling-tutorial/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Predictive modeling with text using tidy data principles

Authors: Emil Hvitfeldt, Julia Silge

Materials for our useR! 2020 online tutorial on 24 July 2020

This tutorial was hosted by R-Ladies en Argentina.

  • Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight?
  • Are you familiar with the basics of predictive modeling, and ready to learn how unstructured text data can be used for prediction within the tidyverse and tidymodels ecosystems?
  • Do you need a flexible framework for handling text data that empowers you to build supervised predictive models?

Text data is increasingly important in many domains, and tidy modeling principles can be applied to natural language processing tasks. This presentation is designed to provide practical guidance and directly applicable knowledge for data scientists and analysts who want to integrate text into their modeling pipelines.

In this 90-minute tutorial, learn how to preprocess text data for modeling, train models, and evaluate model performance. We use slides and live coding during the tutorial to walk through a realistic case study. The tutorial was streamed, recorded, and captioned, and there will be supporting materials and code on GitHub for you to work through afterward.

Expected level of audience's R background

Intermediate familiarity with R, RStudio, basics of regression and classification modeling, and tidyverse packages such as dplyr and ggplot2.

What is in this repo

There are two main resources in this repo:

If you get stuck, you can post a question as an issue on this repo or post on RStudio Community.

About

https://emilhvitfeldt.github.io/useR2020-text-modeling-tutorial/

License:Creative Commons Attribution Share Alike 4.0 International


Languages

Language:CSS 81.5%Language:HTML 18.2%Language:R 0.2%Language:JavaScript 0.0%