forecast_sms

Used to (potentially) forecast sms spend for FxA.

Dependencies (as of right now)

python2.7 or 3 (tried both seems to work)
These python libs (pip install):
- pandas
- numpy
- statsmodels
- for console print-formatting only: tqdm, tabulate

Usage Example

Definitely liable to change, but as of now: python forecast_sms.py [datafile.json] [days_out_to_predict] [q_upper_limit]

e.g.

python forecast_sms.py sms-spend.json 7 0

where 7 is how many days after the last hour in your data to predict out. Predictions that are generated are hour by hour. Obviously predictions will be worse the further out you try to predict. If you don't include the hours it will spit out 24 by default.

The last optional parameter controls whether a search is done for better model parameters. The larger the number the more it will search, with numbers > 10 taking a long time. If the this parameter is 0 or omitted, a pre-defined set of parameters that I've found to to work decently will be used instead.

I suggest trying the default parameters first, then if need be setting the last parameter to 5 and see if the AIC improves (lower the better).

Tested with some test data that's not included in this repo.

Data Input Format

Currently data must be generated with the following command:

aws --profile prod cloudwatch get-metric-statistics --namespace 'AWS/SNS' --metric-name 'SMSMonthToDateSpentUSD' --start-time '2018-04-01T00:00:00Z' --end-time '2018-04-17T00:00:00Z' --period '3600' --statistics 'Average' > sms-spend.json

Just sub out the timestamps for the date range you want.

Data Output Format (Predictions)

Currently, the script prints to the console 6 columns of data in the following order (each row is labeled with the timestamp for the hour that's being predicted):

spent_in_hour_lower_est the lower bound on the estimate for how much money is predicted to be spent in that hour
spent_in_hour_upper_est the upper bound on the estimate for how much money is predicted to be spent in that hour
spent_in_hour_mean_est the mean (i.e., best guess) estimate for how much money is predicted to be spent in that hour
lower_cum_total the lower bound on the estimate for the total (cumulative) amount spent up to and including that hour
upper_cum_total the upper bound on the estimate for the total (cumulative) amount spent up to and including that hour
mean_cum_total the mean (i.e. best guess) estimate for the total (cumulative) amount spent up to and including that hour

The upper and lower bounds should be considered to be 95% confidence intervals, i.e. "we are 95% confident that the actual value will be somewhere between the lower and upper bound".

irrationalagent / forecast_sms

forecast_sms

Dependencies (as of right now)

Usage Example

Data Input Format

Data Output Format (Predictions)

About

Languages