Python 3 support

Question

Python 3 support

jneuff opened this issue 8 years ago · comments

Currently we only support Python 2. In future releases we want to support both Python 2 and 3. This howto outlines the main steps towards Python 3 support.

Maximilian Christ commented 8 years ago

see #26

Grant Beyleveld · Answer 1 · Tue Nov 01 2016 02:28:23 GMT+0800 (China Standard Time)

Bummer. Thanks for the reply.

Maximilian Christ · Answer 2 · Tue Nov 01 2016 04:23:09 GMT+0800 (China Standard Time)

sooner or later we will have that python3 support :)

until then, you could extract the features with a local python2.7 interpreter, pickle the dataframe and then load them into your python3.5 project

Maximilian Christ · Answer 3 · Tue Nov 01 2016 16:57:49 GMT+0800 (China Standard Time)

I will look into this later

Maximilian Christ · Answer 4 · Wed Nov 02 2016 02:10:11 GMT+0800 (China Standard Time)

I just uploaded the branch "i8_add_python3_support"

on it, I started to make tsfresh runable under python3. Now, all unit tests are passing on python 2.7. On Python 3.5.1, 14 unit tests are failing.

Maybe I will have time during the next days to finish the job. Otherwise it would be nice if somebody else could check the changes and getting that unit tests to pass.

earthgecko · Answer 5 · Wed Nov 02 2016 03:01:09 GMT+0800 (China Standard Time)

I will take a look. I have to do it for Skyline at some point and I really want to deep dive into what you are up to here, so it may be an effective method for me to start a Python 3 path in my own sphere and get a handle on how you do not run into some of the clustering issues relating to timeseries as with k-means et al.

@jneuff I have read the paper now and dug a bit deep and I now understand a little more :) I should be say hey TPOT -> tsFRESH :)

@MaxBenChrist anybody interested in having a go at porting any bits and pieces to Python 3 can use Python 3.5.2 (latest) unless there is a reason that Python 3.5.1 is required, which silence on the matter shall be read as py352_ok = True, I am sure you are busy

Nice of blue-yonder and you all to release it, timeseries and ml not being easy and all, this looks like a step :)

Maximilian Christ · Answer 6 · Wed Nov 02 2016 16:49:30 GMT+0800 (China Standard Time)

hi @earthgecko

we are happy about anybody that wants to contribute. You could take my "i8_add_python3_support" branch as a starting point.

Where do one find this py352_ok = True flag? I am not familiar with it.

Bytheway, to what are you referring with TPOT ? :)

Max

earthgecko · Answer 7 · Wed Nov 02 2016 19:03:06 GMT+0800 (China Standard Time)

Hi @MaxBenChrist

I have your i8_add_python3_support branch and I am working on that. Any changes
I will pull small increments on that branch for you.

A question concerning about how to handle Python 3 builtins in a backwards
compatible manner? For example the use of builtins in tsfresh/feature_selection/feature_selector.py
in the i8_add_python3_support branch is not backwards compatible with 2.7.x as
it stands now as there is no builtins in 2.7 and this has ramifactions through
other modules.

I shall add some additional detailed info on #30 for consideration.

There is no flag, it was a question :) You are OK with using 3.5.2, there is no specific reason you are using 3.5.1?

TPOT - https://github.com/rhiever/tpot - I initially thought that tsfresh was doing a subset of what TPOT does, but no TPOT could probably add a FRESH dimension :)

earthgecko · Answer 8 · Thu Nov 03 2016 00:46:27 GMT+0800 (China Standard Time)

Now down 5 failing unit tests from 14

The outstanding ones are mostly related to objects have no attribute 'assertItemsEqual' in a number of contexts and there is a pandas errors related to:

pandas/computation/expressions.py:182: UserWarning: evaluating in Python space because the '*' operator is not supported by numexpr for the bool dtype, use '&' instead

In tests/transformers/test_full_pipeline.py along with an AssertionError too, they may be related

>       self.assertTrue(some_expected_features.issubset(set(extracted_features.columns)))
E       AssertionError: False is not true

Jan Philipp Harries · Answer 9 · Thu Nov 03 2016 00:48:46 GMT+0800 (China Standard Time)

Some info on blocking points (was playing with the Python3 branch but unfortunately have no time to go into depth or create a fix myself right now):

The first Quickstart example
extracted_features = extract_features(timeseries, column_id="id", column_sort="time")

yields:

TypeError                                Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/tsfresh/utilities/dataframe_functions.py in normalize_input_to_internal_representation(df_or_dict, column_id, column_sort, column_kind, column_value)
    239                 id_and_sort_column = [_f for _f in [column_id, column_sort] if _f]
    240                 kind_to_df_map = {key: df_or_dict[[key] + id_and_sort_column].copy().rename(columns={key: "_value"})
--> 241                                   for key in df_or_dict.columns if key not in id_and_sort_column}
    242 
    243                 #todo: is this the right check?

TypeError: can only concatenate list (not "filter") to list

.

When using with column_value="a" you can get around this error but now we get some numexpr errors:

/opt/conda/lib/python3.5/site-packages/pandas/computation/expressions.py:181: UserWarning: evaluating in Python space because the '*' operator is not supported by numexpr for the bool dtype, use '&' instead
  unsupported[op_str]))
/opt/conda/lib/python3.5/site-packages/scipy/signal/spectral.py:772: UserWarning: nperseg = 256, is greater than input length = 15, using nperseg = 15
  'using nperseg = {1:d}'.format(nperseg, x.shape[-1]))

earthgecko · Answer 10 · Thu Nov 03 2016 03:11:53 GMT+0800 (China Standard Time)

The current py2 py3 tests state in a gist - https://gist.github.com/earthgecko/118d168f88ebb37661154e3cb898c1fb

Julius Kreuzer · Answer 11 · Thu Nov 03 2016 04:16:31 GMT+0800 (China Standard Time)

The method assertItemsEqual has been removed from unites.TestCase somewhere along the way to Python 3.5 – we'll need to find a replacement with the same semantics.

earthgecko · Answer 12 · Thu Nov 03 2016 16:07:16 GMT+0800 (China Standard Time)

@jneuff yes! Semantically they appear to be the same, relating failing tests pass \o/

However, fixing that now just letting the next unittest.assertEqual issue raise its head, it seems that assertEqual has changed in py3 as well, that may go a bit deeper :( One step at a time :)

assertEqual change

Current debug

        # Preserve old features
>       self.assertEqual(list(X_transformed.columns), ["feature_1", "a__length", "b__length"])
E       AssertionError: Lists differ: ['feature_1', 'b__length', 'a__length'] != ['feature_1', 'a__length', 'b__length']
E
E       First differing element 1:
E       'b__length'
E       'a__length'
E
E       - ['feature_1', 'b__length', 'a__length']
E       + ['feature_1', 'a__length', 'b__length']

tests/transformers/test_feature_augmenter.py:50: AssertionError

Used in quite a few places - https://github.com/blue-yonder/tsfresh/search?q=assertEqual&type=Code and further to that it must be kept in mind that with tests with 2 elements, this could pass sometimes if any elements were returned in differing order each time.

E       First differing element 0:
E       'b'
E       'a'
E
E       - ['b', 'a']
E       + ['a', 'b']

Maximilian Christ · Answer 13 · Fri Nov 04 2016 18:06:14 GMT+0800 (China Standard Time)

I rewrote those unittests with the six framework.

Some of the unit tests still failed, the reason for that was the bug in #29 . I fixed that. Now you should be able to enjoy your fresh features under python3 :)