I built this app as part of a Capstone project for JHU's Data Science Specialization. It uses n-grams to predict the three most probable next words as the user types. Shiny App Demo.
Industry partners and JHU pointed us to datasets of US News, Blogs, and Twitter datasets at HC Corpora (a text web crawler).
Detailed data processing steps are here.
- Implement Kneser-Ney Smoothing
- Implement More Robust Backoff Model
- Host on a Cloud Server to Train on Larger Corpus