Query about ML model: testdata should be entered as input

Question

Query about ML model: testdata should be entered as input

kennis222 opened this issue 2 years ago · comments

Hi contributor:

I have read the multivariate_example.ipynb, but I still have confusion. Unlike ar_based model, we need to input the test dataframe when we are doing prediction.

For example, after the UCI dataset, the date is from 2004-03-10 to 2005-04-04. I would like to predict the output from 2005-03-06 to 2005-04-04 which lasts 30 days. Is correct as mention below:

training_data = length of data from 2004-03-10 - 2005-02-03
test_data = length of data from 2005-02-04 to 2005-03-05 which lasts 30 days
predict_results = tree_model.predict(test_data, model='ML')
Are predict_results from 2005-03-06 to 2005-04-04?
actual_data to compare with the predict_results: 2005-03-06 to 2005-04-04

BUT what I get is from 2005-02-04 to 2005-03-05:

kennis · Answer 1 · Thu Jan 27 2022 11:42:50 GMT+0800 (China Standard Time)

I am just so confused, because we are unknown about data from 2005-03-06 to 2005-04-04, but it requires test_data as input. Based on my current understanding, we need the test_data as input because we want to utilize the lags to make the prediction.

Home of AutoViz, AutoViML and featurewiz · Answer 2 · Thu Jan 27 2022 23:03:04 GMT+0800 (China Standard Time)

Hi @kennis222 👍

If you want to predict dates from 2005-03-06 to 2005-04-04 then you have to create that as a "testdata" and feed it to the predict function. I am not sure whether you are implying that it is not predicting the target for the right dates or for predicting wrong target predictions for those dates?
Can you please be more clear. Your post itself is very confusing to me.
AutoViML

kennis · Answer 3 · Fri Jan 28 2022 01:52:49 GMT+0800 (China Standard Time)

I'm sorry that this is confusing for you. Let me rephrase the question again.

My question is that I would like to know whether I have any misunderstanding about using ML model in auto-ts.

Background update:
There is a dataset whose dates are from 2004-03-10 to 2005-04-04. I split the dataset where 90% of the dataset as the training data (from 2004-03-10 to 2005-02-23) and the rest of 10% of the dataset as the test/validation data (from 2005-02-24 to 2005-04-04). The target is to predict CO(GT) from 2005-02-24 to 2005-04-04 and compare with the actual results.

After training the model and let the model do the prediction, unlike ar_based model such as var model where param can be filled by forecast_period(type= int), ml model requires testdata, "The test dataframe in pretransformed format", as param. In order to predict, I create a new dataframe called test_data_input with test_data[['Date']] only. After that, I tried to conduct the prediction and it seems to work.

When I follow the same steps in my own data, it reports an error, "ValueError: Length mismatch: Expected axis has 2 elements, new values have 6 elements". I check the same error on the website, but the situation is different. So far I have made sure that I am following the similar steps without 'typo'. (btw, my dataset's features are more than 6. )

Home of AutoViz, AutoViML and featurewiz · Answer 4 · Fri Jan 28 2022 22:04:33 GMT+0800 (China Standard Time)

Hi @kennis222 👍
The problem is that you are not providing the "predictors" for your target ( CO(GT) ). In the example above, there are at least 3 predictors for your target: NMHC, C6H6, PT08.S5(O3) .

Since the ML model was trained on three predictors for one target it is expecting the same 3 predictors in the test dataset as well. Am I understanding this correctly??

Sorry if I have still not understood your problem.

Auto_ViML

kennis · Answer 5 · Fri Jan 28 2022 23:56:27 GMT+0800 (China Standard Time)

Got it, thanks.