Soybean Price Prediction / Winning solution / MinneAnalytics Data Science Challenge
The Challenge
The MinneAnalytics 2019 challenge involved forecasting Soybean futures so as to help local soy farmers make informed decisions about when to sell their crops. A decision on what price to sell at, especially in the volatile market of 2019, is critical. Our work involved collecting data including but not limited to commodity prices, financial indexes, Google News trends, and tweets of policy makers. This was followed by an extensive implementation of predictive modeling methods including ensemble methods and recurrent neural networks. Our model achieved a prediction error of ~5.6 cents (< 1%).
More than 100 teams presented at the Optum Technology Center in Minneapolis on November 9th and our team was awarded first place in the Graduate Student division.
Team: Harsh Seksaria, Piyush Gupta, Hamed Khoojinian, Yassine Manane, Pushkar Vengulekar
Congratulations to @CarlsonNews for taking home First Prize in the Graduate division #MinneMUDAC pic.twitter.com/t6AJMA40jM
— MinneAnalytics (@MinneAnalytics) November 15, 2019
Winner across divisions - http://minneanalytics.org/announcing-the-winners-of-the-minnemudac-2019-student-data-science-challenge/
Process Overview
Data Collection
We are as much commodity traders as we are farmers, so first we went looking at peer reviewed journals for expert research. We came away with the understanding that prices and their interactions were the most important category of data sources we could consider, So starting from the most important price: the price of recent soybean futures, we added other prices we felt important. We added metrics to capture the behavior, and instability, of traders and the exchange itself as well as social media and news trends to capture public sentiment.
Machine Learning Journey
We experimented with multiple prediction algorithms
The three approaches - XGBoost, LSTM and Seq2Seq are really good and give comparable performance. But as per our research Neural Networks need lot of data to train with which we don’t have for this problem statement.
In addition to this, our focus was on generating insights along with predictions, so we decided to finalize with XGBoost because it provides interpretation in terms of feature importance.
Most Important Features
Profitability for the farmer
Let’s say farmer Joe has to sell 6400 bushels of soybean. He could make the decision of holding or selling the futures based on his intuition of looking at the previous trend, or he could use our analysis.
For each day of the first week of November, the farmer could have profited by making a SELL or HOLD decision using our Machine Learning approach and saved $372 on an average.
In a volatile market, a data driven SELL or HOLD decision could have saved the farmer over $7k!