There are 400K Amazon reviews in csv file: https://drive.google.com/open?id=1bDwcBdCiEZ2pfLOc6ANi85TLAspyhYUR. I just picked every 5th review as test data and the rest of them as traning data. First I use scikit-learn to get the feature matrix of our data, then use 3 methods including Decision Tree, Neural Network and Naive Bayes to do prediction. I also used pyplot to plot some curves which shown the performance of these 3 methods.
According to these curves, we can conclude that, if we have enough data (int our example, it should be more than 200k) to train our model, we should use Neural Network. So I produced my prediction model: model.pkl, using Neural Network.