EmilHvitfeldt / smltar

Manuscript of the book "Supervised Machine Learning for Text Analysis in R" by Emil Hvitfeldt and Julia Silge

Home Page:https://smltar.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question on Chapter 6.9.3 of the SMLTAR book

marcelbaumgartner opened this issue · comments

Dear Emil,
I will teach a new class on Text Mining in September, at the local Management school. The content is HEAVILY inspired by your amazing book, and also what Julia and David have written in the original tidy text book. I have a question on chapter 6.9.3. At the end of the process, we finally fit our model on the test data, never used before. This is done through the command final_fitted <- last_fit(final_wf, scotus_split). Does last_fit() understand that it should get the data from 25% test split that we created earlier with initial_split() ? I don't see where you fit your final model using the data from the object scotus_test. Can you enlighten me? Thanks! Marcel

You can check out some details on last_fit() in these two places:

Notice that last_fit() fits the model using the training set and then evaluates the performance using the testing set. We don't ever fit the model to the test set since its purpose is to estimate performance on new data.

Thanks Julia for your quick response. All clear now. These new R packages and their code make it sometimes simply too simple :).

Sounds good! Let us know if you have further questions. 🙌