Lab41 / pythia

Supervised learning for novelty detection in text

Home Page:http://lab41.github.io/pythia/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Log reg getting bad values.

pcallier opened this issue · comments

XP fails on BOWAT_LDAAC_W2V_LR.json bc of NaNs, infs, or something else. Using english dataset
Settings

"USE_CACHE":true, 
"OVERSAMPLING": true, 
"BOW_APPEND":true, "BOW_TFIDF":true, 
"LDA_APPEND":true, "LDA_COS":true, 
"W2V_APPEND":true, "W2V_ABS":true, 
"XGB":false, "LOG_REG":true

Error and traceback

ERROR - pythia_experiment - Failed after 0:06:49!                                                                                   
Traceback (most recent calls WITHOUT Sacred internals):                                                                             
  File "experiments/experiments.py", line 281, in run_experiment                                                                    
    USE_CACHE)                                                                                                                      
  File "/home/pcallier/pythia/src/pipelines/master_pipeline.py", line 89, in main                                                   
    logreg_model = log_reg.main([train_data, train_target, algorithms['log_reg']])                                                  
  File "/home/pcallier/pythia/src/pipelines/log_reg.py", line 47, in main                                                           
    logreg = run_model(train_data, train_target, **args_dict)                                                                       
  File "/home/pcallier/pythia/src/pipelines/log_reg.py", line 32, in run_model                                                      
    logreg.fit(train_data, train_labels)                                                                                            
  File "/opt/conda/envs/py3-pythia/lib/python3.5/site-packages/sklearn/linear_model/logistic.py", line 1142, in fit                 
    order="C")                                                                                                                      
  File "/opt/conda/envs/py3-pythia/lib/python3.5/site-packages/sklearn/utils/validation.py", line 510, in check_X_y                 
    ensure_min_features, warn_on_dtype, estimator)                                                                                  
  File "/opt/conda/envs/py3-pythia/lib/python3.5/site-packages/sklearn/utils/validation.py", line 398, in check_array               
    _assert_all_finite(array)                                                                                                       
  File "/opt/conda/envs/py3-pythia/lib/python3.5/site-packages/sklearn/utils/validation.py", line 54, in _assert_all_finite         
    " or a value too large for %r." % X.dtype)                                                                                      
ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). 

Repeatable on my system, I am looking into it. I suspect it is a TFIDF score of infinity due to a zero length document (normalization divides by doc length).