Log reg getting bad values.
pcallier opened this issue · comments
Patrick Callier commented
XP fails on BOWAT_LDAAC_W2V_LR.json bc of NaNs, infs, or something else. Using english
dataset
Settings
"USE_CACHE":true,
"OVERSAMPLING": true,
"BOW_APPEND":true, "BOW_TFIDF":true,
"LDA_APPEND":true, "LDA_COS":true,
"W2V_APPEND":true, "W2V_ABS":true,
"XGB":false, "LOG_REG":true
Error and traceback
ERROR - pythia_experiment - Failed after 0:06:49!
Traceback (most recent calls WITHOUT Sacred internals):
File "experiments/experiments.py", line 281, in run_experiment
USE_CACHE)
File "/home/pcallier/pythia/src/pipelines/master_pipeline.py", line 89, in main
logreg_model = log_reg.main([train_data, train_target, algorithms['log_reg']])
File "/home/pcallier/pythia/src/pipelines/log_reg.py", line 47, in main
logreg = run_model(train_data, train_target, **args_dict)
File "/home/pcallier/pythia/src/pipelines/log_reg.py", line 32, in run_model
logreg.fit(train_data, train_labels)
File "/opt/conda/envs/py3-pythia/lib/python3.5/site-packages/sklearn/linear_model/logistic.py", line 1142, in fit
order="C")
File "/opt/conda/envs/py3-pythia/lib/python3.5/site-packages/sklearn/utils/validation.py", line 510, in check_X_y
ensure_min_features, warn_on_dtype, estimator)
File "/opt/conda/envs/py3-pythia/lib/python3.5/site-packages/sklearn/utils/validation.py", line 398, in check_array
_assert_all_finite(array)
File "/opt/conda/envs/py3-pythia/lib/python3.5/site-packages/sklearn/utils/validation.py", line 54, in _assert_all_finite
" or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
tukeyclothespin commented
Repeatable on my system, I am looking into it. I suspect it is a TFIDF score of infinity due to a zero length document (normalization divides by doc length).