minimaxir / automl-gs

Provide an input CSV and a target field to predict, generate a model + code to run it.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SyntaxError: invalid syntax when fields start with a number.

Nakeuh opened this issue · comments

Hi, and thanks for your work.

I tried to run your project using a dataset that have some fields that starts with numbers and this throws a Syntax error.
For example, with a field named '1stFlrSF', I got the following error :

Traceback (most recent call last):
  File "model.py", line 3, in <module>
    from pipeline import *
  File "[MY_PATH]/automl_train/pipeline.py", line 1090
    1stflrsf_enc = df['1stFlrSF']
               ^
SyntaxError: invalid syntax

  0%|          | 0/20 [00:00<?, ?epoch/s]Traceback (most recent call last):
  File "[MY_PATH]/test_auto_ml/Test.py", line 8, in <module>
    do_the_thing("[MY_DATASET_PATH]/train.csv","SalePrice")
  File "[MY_PATH]/test_auto_ml/Test.py", line 5, in do_the_thing
    automl_grid_search(path,label)
  File "[MY_PYTHON_PATH]/site-packages/automl_gs/automl_gs.py", line 94, in automl_grid_search
    train_results = results.tail(1).to_dict('records')[0]
IndexError: list index out of range

That's a valid edge case. (Python does not like creating variables that start with a number).

Wonder what the best way to handle this. Can't remove the number during preprocessing because it could create a field name conflict.

I think that adding a (non numerical) character in front of every fields should do the trick.
Should be possible to add an '_' in front of every fields when retrieving the values, and removing itcharacter when it is outputed.