Word! Next Word Predictor

I built this app as part of a Capstone project for JHU's Data Science Specialization. It uses n-grams to predict the three most probable next words as the user types. Shiny App Demo.

Training Data

Industry partners and JHU pointed us to datasets of US News, Blogs, and Twitter datasets at HC Corpora (a text web crawler).

Detailed data processing steps are here.

Under the Hood

Future Considerations

Implement Kneser-Ney Smoothing
Implement More Robust Backoff Model
Host on a Cloud Server to Train on Larger Corpus

Ash Chakraborty

About

Languages

Language:HTML 99.3%Language:R 0.7%