thinline72 / nsl-kdd

PySpark solution to the NSL-KDD dataset: https://www.unb.ca/cic/datasets/nsl.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use of * in standardizer function results in error

ruze00 opened this issue · comments

commented

In the standardizer section, the following code results in a syntax error:

train_scaler = [*binary_cols, *list(map(standardizer, numeric_cols)), *['id', 'labels2_index', 'labels2', 'labels5_index', 'labels5']]
test_scaler = [*test_binary_cols, *list(map(standardizer, numeric_cols)), *['id', 'labels2_index', 'labels2', 'labels5_index', 'labels5']]

It doesn't like the * syntax. Is that supposed to be there? I'm using jupyter/all-spark-notebook docker image.

Removing the *s results in a different error.

Hi @ruze00 ,

* is used for unpacking python lists https://docs.python.org/3/tutorial/controlflow.html#unpacking-argument-lists
So, for example
train_scaler = [*binary_cols, *list(map(standardizer, numeric_cols)), *['id', 'labels2_index', 'labels2', 'labels5_index', 'labels5']]
just produces a flatten list of columns.

Do you use Python 3+? Looks like Python 2+ doesn't support such syntax.
Notebook is written in Python 3.

commented

@thinline72, thanks so much for your response. I didn't check the version, sorry. It must have been 2. I started again from scratch with Python 3 and no issues.