Chapter 6 code freezing RStudio

Question

Chapter 6 code freezing RStudio

PursuitOfDataScience opened this issue 2 years ago · comments

Hi,

Thanks for such an amazingly written book on text analysis using tidymodels.

I've been trying to replicate the code presented in Chapter 6 about using scotus to train a few machine learning models. When I use prep() on the scotus_rec recipe, my R session got stuck and had to force it to stop. I encountered the same result even after I changed max_tokens = 300. I am wondering if it is expensive to run these chunks of code? It seems like the data set isn't big enough to cause such an issue. Of course, there is no way to train the model, as it is ridiculously slow and the session memory could reach to 8 GB.

Thanks!

Emil Hvitfeldt · Answer 1 · Fri Apr 29 2022 03:34:56 GMT+0800 (China Standard Time)

Hello @PursuitOfDataScience, That sounds a little extreme, the data isn't super small, but it shouldn't give you problems with that amount of memory. Can you run session_info() for me and paste it here, we can try to figure out why this is happening to you.

library(tidymodels)
library(tidyverse)
library(textrecipes)
library(scotus)

set.seed(1234)
scotus_split <- scotus_filtered %>%
  mutate(year = as.numeric(year),
         text = str_remove_all(text, "'")) %>%
  initial_split()

scotus_train <- training(scotus_split)
scotus_test <- testing(scotus_split)

scotus_rec <- recipe(year ~ text, data = scotus_train) %>%
  step_tokenize(text) %>%
  step_tokenfilter(text, max_tokens = 1e3) %>%
  step_tfidf(text) %>%
  step_normalize(all_predictors())

scotus_prep <- prep(scotus_rec)

lobstr::obj_size(scotus_prep)
#> 254,989,104 B

^{Created on 2022-04-28 by the reprex package (v2.0.1)}

Y. Yu · Answer 2 · Fri Apr 29 2022 05:23:57 GMT+0800 (China Standard Time)

Hi, thanks for the answer!

I reran the code and it was way faster than it when I ran yesterday. I guess there was something wrong in my R session yesterday. Now it seems like all is good. Thanks!