[BUG]when convert_frequency meet problems

Question

[BUG]when convert_frequency meet problems

halfbottles opened this issue 4 months ago · comments

halfbottles commented 4 months ago

Bug Report Checklist

I provided code that demonstrates a minimal reproducible example.
I confirmed bug exists on the latest mainline of AutoGluon via source install.
I confirmed bug exists on the latest stable version of AutoGluon.

Describe the bug

when convert_frequency meet this problem which i didn't see in issues
Expected behavior

To Reproduce

Screenshots / Logs

Warning: path already exists! This predictor may overwrite an existing predictor! path="./gd_hour/5month05131041"
Beginning AutoGluon training... Time limit = 1800s
AutoGluon will save models to './gd_hour/5month05131041'
=================== System Info ===================
AutoGluon Version: 1.0.0
Python Version: 3.10.14
Operating System: Windows
Platform Machine: AMD64
Platform Version: 10.0.22631
CPU Count: 18
GPU Count: 1
Memory Avail: 3.64 GB / 23.47 GB (15.5%)
Disk Space Avail: 439.28 GB / 627.16 GB (70.0%)

Setting presets to: best_quality

Fitting with arguments:
{'enable_ensemble': True,
'eval_metric': MAPE,
'freq': 'H',
'hyperparameters': {'ADIDA': {},
'AutoARIMA': {},
'AutoCES': {},
'AutoETS': {},
'Average': {},
'Chronos': {'batch_size': 1,
'device': 'cuda',
'model_path': 'large'},
'CrostonSBA': {},
'DeepAR': {},
'DirectTabular': {},
'ETS': {},
'IMAPA': {},
'NPTS': {},
'Naive': {},
'PatchTST': {},
'RecursiveTabular': {},
'SeasonalAverage': {},
'SeasonalNaive': {},
'SimpleFeedForward': {},
'TemporalFusionTransformer': {},
'Theta': {},
'WaveNet': {},
'Zero': {}},
'known_covariates_names': ['temp_min',
'temp_max',
'Holiday',
'weekend',
'week2_5',
'week_1'],
'num_val_windows': 1,
'prediction_length': 40,
'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
'random_seed': 123,
'refit_every_n_windows': 1,
'refit_full': False,
'target': 'elec_std',
'time_limit': 1800,
'verbosity': 2}

_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\joblib\externals\loky\process_executor.py", line 463, in _process_worker
r = call_item()
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\joblib\externals\loky\process_executor.py", line 291, in call
return self.fn(*self.args, **self.kwargs)
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\joblib\parallel.py", line 589, in call
return [func(*args, **kwargs)
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\joblib\parallel.py", line 589, in
return [func(*args, **kwargs)
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\autogluon\timeseries\dataset\ts_dataframe.py", line 962, in resample_chunk
resampled_df = df.resample(offset, level=TIMESTAMP, **kwargs).agg(aggregation)
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\pandas\core\resample.py", line 338, in aggregate
result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\pandas\core\apply.py", line 175, in agg
return self.agg_dict_like()
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\pandas\core\apply.py", line 406, in agg_dict_like
return self.agg_or_apply_dict_like(op_name="agg")
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\pandas\core\apply.py", line 1390, in agg_or_apply_dict_like
result_index, result_data = self.compute_dict_like(
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\pandas\core\apply.py", line 469, in compute_dict_like
key_data = [
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\pandas\core\apply.py", line 470, in
getattr(selected_obj._ixs(indice, axis=1), op_name)(how, **kwargs)
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\pandas\core\series.py", line 4610, in aggregate
result = op.agg()
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\pandas\core\apply.py", line 1212, in agg
result = super().agg()
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\pandas\core\apply.py", line 172, in agg
return self.apply_str()
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\pandas\core\apply.py", line 586, in apply_str
return self._apply_str(obj, func, *self.args, **self.kwargs)
File "D:\Program Files\anaconda\envs\ag\lib\site-packages\pandas\core\apply.py", line 669, in _apply_str
return f(*args, **kwargs)
TypeError: NDFrame.first() missing 1 required positional argument: 'offset'
"""

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last)
Cell In[10], line 17
7 # train_data = train_data.convert_frequency(freq="h", num_cpus=1, agg_numeric="mean")
8 predictor = TimeSeriesPredictor(
9 prediction_length=prediction_length,
10 # path=os.path.join(path, ""),
(...)
15 known_covariates_names=known_covariates_names
16 )
---> 17 predictor.fit(
18 train_data,
19 presets=presets,
20 time_limit=time_limit,
21 # hyperparameter_tune_kwargs="auto",
22 num_val_windows=num_val_windows,
23 hyperparameters=hyperparameters
24 )

File D:\Program Files\anaconda\envs\ag\lib\site-packages\autogluon\core\utils\decorators.py:31, in unpack.._unpack_inner.._call(*args, **kwargs)
28 @functools.wraps(f)
29 def _call(*args, **kwargs):
30 gargs, gkwargs = g(*other_args, *args, **kwargs)
---> 31 return f(*gargs, **gkwargs)

File D:\Program Files\anaconda\envs\ag\lib\site-packages\autogluon\timeseries\predictor.py:644, in TimeSeriesPredictor.fit(self, train_data, tuning_data, time_limit, presets, hyperparameters, hyperparameter_tune_kwargs, excluded_model_types, num_val_windows, val_step_size, refit_every_n_windows, refit_full, enable_ensemble, random_seed, verbosity)
641 logger.info("\nFitting with arguments:")
642 logger.info(f"{pprint.pformat({k: v for k, v in fit_args.items() if v is not None})}\n")
--> 644 train_data = self._check_and_prepare_data_frame(train_data, name="train_data")
645 logger.info(f"Provided train_data has {self._get_dataset_stats(train_data)}")
647 if val_step_size is None:

File D:\Program Files\anaconda\envs\ag\lib\site-packages\autogluon\timeseries\predictor.py:279, in TimeSeriesPredictor._check_and_prepare_data_frame(self, data, name)
277 if df.freq != self.freq:
278 logger.warning(f"{name} with frequency '{df.freq}' has been resampled to frequency '{self.freq}'.")
--> 279 df = df.convert_frequency(freq=self.freq)
281 # Fill missing values
282 if df.isna().values.any():
283 # FIXME: Do not automatically fill NaNs here, handle missing values at the level of individual models.
284 # FIXME: Current solution leads to incorrect metric computation if missing values are present

File D:\Program Files\anaconda\envs\ag\lib\site-packages\autogluon\timeseries\dataset\ts_dataframe.py:969, in TimeSeriesDataFrame.convert_frequency(self, freq, agg_numeric, agg_categorical, num_cpus, chunk_size, **kwargs)
966 # Resampling time for 1 item < overhead time for a single parallel job. Therefore, we group items into chunks
967 # so that the speedup from parallelization isn't dominated by the communication costs.
968 chunks = split_into_chunks(pd.DataFrame(self).groupby(level=ITEMID, sort=False), chunk_size)
--> 969 resampled_chunks = Parallel(n_jobs=num_cpus)(delayed(resample_chunk)(chunk) for chunk in chunks)
970 resampled_df = TimeSeriesDataFrame(pd.concat(resampled_chunks))
971 resampled_df.static_features = self.static_features

File D:\Program Files\anaconda\envs\ag\lib\site-packages\joblib\parallel.py:1952, in Parallel.call(self, iterable)
1946 # The first item from the output is blank, but it makes the interpreter
1947 # progress until it enters the Try/Except block of the generator and
1948 # reach the first yield statement. This starts the aynchronous
1949 # dispatch of the tasks to the workers.
1950 next(output)
-> 1952 return output if self.return_generator else list(output)

File D:\Program Files\anaconda\envs\ag\lib\site-packages\joblib\parallel.py:1595, in Parallel._get_outputs(self, iterator, pre_dispatch)
1592 yield
1594 with self._backend.retrieval_context():
-> 1595 yield from self._retrieve()
1597 except GeneratorExit:
1598 # The generator has been garbage collected before being fully
1599 # consumed. This aborts the remaining tasks if possible and warn
1600 # the user if necessary.
1601 self._exception = True

File D:\Program Files\anaconda\envs\ag\lib\site-packages\joblib\parallel.py:1699, in Parallel._retrieve(self)
1692 while self._wait_retrieval():
1693
1694 # If the callback thread of a worker has signaled that its task
1695 # triggered an exception, or if the retrieval loop has raised an
1696 # exception (e.g. GeneratorExit), exit the loop and surface the
1697 # worker traceback.
1698 if self._aborting:
-> 1699 self._raise_error_fast()
1700 break
1702 # If the next job is not ready for retrieval yet, we just wait for
1703 # async callbacks to progress.

File D:\Program Files\anaconda\envs\ag\lib\site-packages\joblib\parallel.py:1734, in Parallel._raise_error_fast(self)
1730 # If this error job exists, immediatly raise the error by
1731 # calling get_result. This job might not exists if abort has been
1732 # called directly or if the generator is gc'ed.
1733 if error_job is not None:
-> 1734 error_job.get_result(self.timeout)

File D:\Program Files\anaconda\envs\ag\lib\site-packages\joblib\parallel.py:736, in BatchCompletionCallBack.get_result(self, timeout)
730 backend = self.parallel._backend
732 if backend.supports_retrieve_callback:
733 # We assume that the result has already been retrieved by the
734 # callback thread, and is stored internally. It's just waiting to
735 # be returned.
--> 736 return self._return_or_raise()
738 # For other backends, the main thread needs to run the retrieval step.
739 try:

File D:\Program Files\anaconda\envs\ag\lib\site-packages\joblib\parallel.py:754, in BatchCompletionCallBack._return_or_raise(self)
752 try:
753 if self.status == TASK_ERROR:
--> 754 raise self._result
755 return self._result
756 finally:

TypeError: NDFrame.first() missing 1 required positional argument: 'offset'

Installed Versions

1.1.0

# Replace this code with the output of the following:
from autogluon.core.utils import show_versions
show_versions()

INSTALLED VERSIONS

date : 2024-05-13
time : 10:44:10.525602
python : 3.10.14.final.0
OS : Windows
OS-release : 10
Version : 10.0.22631
machine : AMD64
processor : Intel64 Family 6 Model 170 Stepping 4, GenuineIntel
num_cores : 18
cpu_ram_mb : 24029.3671875
cuda version : None
num_gpus : 1
gpu_ram_mb : [7957]
avail_disk_size_mb : None
accelerate : 0.21.0
async-timeout : 4.0.3
autogluon : 1.0.0
autogluon.common : 1.0.0
autogluon.core : 1.0.0
autogluon.features : 1.0.0
autogluon.multimodal : 1.0.0
autogluon.tabular : 1.0.0
autogluon.timeseries : 1.0.0
boto3 : 1.34.70
catboost : 1.2.3
defusedxml : 0.7.1
evaluate : 0.4.1
fastai : 2.7.14
gluonts : 0.14.4
hyperopt : None
imodels : None
jinja2 : 3.1.3
joblib : 1.3.2
jsonschema : 4.17.3
lightgbm : 4.1.0
lightning : 2.0.4
matplotlib : 3.8.3
mlforecast : 0.10.0
networkx : 3.2.1
nlpaug : 1.1.11
nltk : 3.8.1
nptyping : 2.4.1
numpy : 1.26.4
nvidia-ml-py3 : None
omegaconf : 2.3.0
onnxruntime-gpu : None
openmim : 0.3.7
orjson : 3.9.15
pandas : 2.1.4
Pillow : 10.0.1
psutil : 5.9.8
PyMuPDF : None
pytesseract : 0.3.10
pytorch-lightning : 1.9.5
pytorch-metric-learning: 1.7.3
ray : None
requests : 2.31.0
scikit-image : 0.19.3
scikit-learn : 1.4.1.post1
scikit-learn-intelex : None
scipy : 1.12.0
seqeval : 1.2.2
setuptools : 68.2.2
skl2onnx : None
statsforecast : 1.4.0
statsmodels : 0.14.1
tabpfn : None
tensorboard : 2.16.2
text-unidecode : 1.3
timm : 0.9.16
torch : 2.0.1
torchmetrics : 1.1.2
torchvision : 0.15.2a0
tqdm : 4.66.2
transformers : 4.31.0
utilsforecast : 0.1.2
vowpalwabbit : None
xgboost : None