Question on Chapter 6.9.3 of the SMLTAR book

Question

Question on Chapter 6.9.3 of the SMLTAR book

marcelbaumgartner opened this issue a year ago · comments

Dear Emil,
I will teach a new class on Text Mining in September, at the local Management school. The content is HEAVILY inspired by your amazing book, and also what Julia and David have written in the original tidy text book. I have a question on chapter 6.9.3. At the end of the process, we finally fit our model on the test data, never used before. This is done through the command final_fitted <- last_fit(final_wf, scotus_split). Does last_fit() understand that it should get the data from 25% test split that we created earlier with initial_split() ? I don't see where you fit your final model using the data from the object scotus_test. Can you enlighten me? Thanks! Marcel

Julia Silge · Answer 1 · Thu Jun 15 2023 23:50:11 GMT+0800 (China Standard Time)

You can check out some details on last_fit() in these two places:

Notice that last_fit() fits the model using the training set and then evaluates the performance using the testing set. We don't ever fit the model to the test set since its purpose is to estimate performance on new data.

Marcel Baumgartner · Answer 2 · Fri Jun 16 2023 20:38:38 GMT+0800 (China Standard Time)

Thanks Julia for your quick response. All clear now. These new R packages and their code make it sometimes simply too simple :).

Julia Silge · Answer 3 · Fri Jun 16 2023 23:39:40 GMT+0800 (China Standard Time)

Sounds good! Let us know if you have further questions. 🙌