joohk / Soybean-Price-Prediction-Winning-solution-MinneAnalytics-Data-Science-Challenge

This project includes following repositories Presentation Machine Learning algorithms like Prophet, ARIMA, XGBoost, LSTM and Seq2Seq

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Soybean Price Prediction / Winning solution / MinneAnalytics Data Science Challenge

The Challenge

The MinneAnalytics 2019 challenge involved forecasting Soybean futures so as to help local soy farmers make informed decisions about when to sell their crops. A decision on what price to sell at, especially in the volatile market of 2019, is critical. Our work involved collecting data including but not limited to commodity prices, financial indexes, Google News trends, and tweets of policy makers. This was followed by an extensive implementation of predictive modeling methods including ensemble methods and recurrent neural networks. Our model achieved a prediction error of ~5.6 cents (< 1%).

1574100030540

More than 100 teams presented at the Optum Technology Center in Minneapolis on November 9th and our team was awarded first place in the Graduate Student division.

Team: Harsh Seksaria, Piyush Gupta, Hamed Khoojinian, Yassine Manane, Pushkar Vengulekar

1574100030540

Congratulations to @CarlsonNews for taking home First Prize in the Graduate division #MinneMUDAC pic.twitter.com/t6AJMA40jM

— MinneAnalytics (@MinneAnalytics) November 15, 2019

Winner across divisions - http://minneanalytics.org/announcing-the-winners-of-the-minnemudac-2019-student-data-science-challenge/

Process Overview

Data Collection

We are as much commodity traders as we are farmers, so first we went looking at peer reviewed journals for expert research. We came away with the understanding that prices and their interactions were the most important category of data sources we could consider, So starting from the most important price: the price of recent soybean futures, we added other prices we felt important. We added metrics to capture the behavior, and instability, of traders and the exchange itself as well as social media and news trends to capture public sentiment.

1574100207862

Machine Learning Journey

We experimented with multiple prediction algorithms

  1. ARIMA
  2. Prophet
  3. XGBoost
  4. LSTM
  5. Seq2Seq

The three approaches - XGBoost, LSTM and Seq2Seq are really good and give comparable performance. But as per our research Neural Networks need lot of data to train with which we don’t have for this problem statement.

In addition to this, our focus was on generating insights along with predictions, so we decided to finalize with XGBoost because it provides interpretation in terms of feature importance.

1574100414202

Most Important Features

1574100723594

Profitability for the farmer

Let’s say farmer Joe has to sell 6400 bushels of soybean. He could make the decision of holding or selling the futures based on his intuition of looking at the previous trend, or he could use our analysis.

For each day of the first week of November, the farmer could have profited by making a SELL or HOLD decision using our Machine Learning approach and saved $372 on an average.

1574100794635

In a volatile market, a data driven SELL or HOLD decision could have saved the farmer over $7k!

1574100879786

About

This project includes following repositories Presentation Machine Learning algorithms like Prophet, ARIMA, XGBoost, LSTM and Seq2Seq


Languages

Language:Jupyter Notebook 100.0%