Chapter 6 code freezing RStudio
PursuitOfDataScience opened this issue · comments
Hi,
Thanks for such an amazingly written book on text analysis using tidymodels.
I've been trying to replicate the code presented in Chapter 6 about using scotus
to train a few machine learning models. When I use prep()
on the scotus_rec
recipe, my R session got stuck and had to force it to stop. I encountered the same result even after I changed max_tokens = 300
. I am wondering if it is expensive to run these chunks of code? It seems like the data set isn't big enough to cause such an issue. Of course, there is no way to train the model, as it is ridiculously slow and the session memory could reach to 8 GB.
Thanks!
Hello @PursuitOfDataScience, That sounds a little extreme, the data isn't super small, but it shouldn't give you problems with that amount of memory. Can you run session_info()
for me and paste it here, we can try to figure out why this is happening to you.
library(tidymodels)
library(tidyverse)
library(textrecipes)
library(scotus)
set.seed(1234)
scotus_split <- scotus_filtered %>%
mutate(year = as.numeric(year),
text = str_remove_all(text, "'")) %>%
initial_split()
scotus_train <- training(scotus_split)
scotus_test <- testing(scotus_split)
scotus_rec <- recipe(year ~ text, data = scotus_train) %>%
step_tokenize(text) %>%
step_tokenfilter(text, max_tokens = 1e3) %>%
step_tfidf(text) %>%
step_normalize(all_predictors())
scotus_prep <- prep(scotus_rec)
lobstr::obj_size(scotus_prep)
#> 254,989,104 B
Created on 2022-04-28 by the reprex package (v2.0.1)
Hi, thanks for the answer!
I reran the code and it was way faster than it when I ran yesterday. I guess there was something wrong in my R session yesterday. Now it seems like all is good. Thanks!