blue-yonder / tsfresh

Automatic extraction of relevant features from time series:

Home Page:http://tsfresh.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python 3 support

jneuff opened this issue · comments

Currently we only support Python 2. In future releases we want to support both Python 2 and 3. This howto outlines the main steps towards Python 3 support.

Bummer. Thanks for the reply.

sooner or later we will have that python3 support :)

until then, you could extract the features with a local python2.7 interpreter, pickle the dataframe and then load them into your python3.5 project

I will look into this later

I just uploaded the branch "i8_add_python3_support"

on it, I started to make tsfresh runable under python3. Now, all unit tests are passing on python 2.7. On Python 3.5.1, 14 unit tests are failing.

Maybe I will have time during the next days to finish the job. Otherwise it would be nice if somebody else could check the changes and getting that unit tests to pass.

I will take a look. I have to do it for Skyline at some point and I really want to deep dive into what you are up to here, so it may be an effective method for me to start a Python 3 path in my own sphere and get a handle on how you do not run into some of the clustering issues relating to timeseries as with k-means et al.

@jneuff I have read the paper now and dug a bit deep and I now understand a little more :) I should be say hey TPOT -> tsFRESH :)

@MaxBenChrist anybody interested in having a go at porting any bits and pieces to Python 3 can use Python 3.5.2 (latest) unless there is a reason that Python 3.5.1 is required, which silence on the matter shall be read as py352_ok = True, I am sure you are busy

Nice of blue-yonder and you all to release it, timeseries and ml not being easy and all, this looks like a step :)

hi @earthgecko

we are happy about anybody that wants to contribute. You could take my "i8_add_python3_support" branch as a starting point.

Where do one find this py352_ok = True flag? I am not familiar with it.

Bytheway, to what are you referring with TPOT ? :)

Max

Hi @MaxBenChrist

I have your i8_add_python3_support branch and I am working on that. Any changes
I will pull small increments on that branch for you.

A question concerning about how to handle Python 3 builtins in a backwards
compatible manner? For example the use of builtins in tsfresh/feature_selection/feature_selector.py
in the i8_add_python3_support branch is not backwards compatible with 2.7.x as
it stands now as there is no builtins in 2.7 and this has ramifactions through
other modules.

I shall add some additional detailed info on #30 for consideration.

There is no flag, it was a question :) You are OK with using 3.5.2, there is no specific reason you are using 3.5.1?

TPOT - https://github.com/rhiever/tpot - I initially thought that tsfresh was doing a subset of what TPOT does, but no TPOT could probably add a FRESH dimension :)

Now down 5 failing unit tests from 14

The outstanding ones are mostly related to objects have no attribute 'assertItemsEqual' in a number of contexts and there is a pandas errors related to:

pandas/computation/expressions.py:182: UserWarning: evaluating in Python space because the '*' operator is not supported by numexpr for the bool dtype, use '&' instead

In tests/transformers/test_full_pipeline.py along with an AssertionError too, they may be related

>       self.assertTrue(some_expected_features.issubset(set(extracted_features.columns)))
E       AssertionError: False is not true

Some info on blocking points (was playing with the Python3 branch but unfortunately have no time to go into depth or create a fix myself right now):

The first Quickstart example
extracted_features = extract_features(timeseries, column_id="id", column_sort="time")

yields:

TypeError                                Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/tsfresh/utilities/dataframe_functions.py in normalize_input_to_internal_representation(df_or_dict, column_id, column_sort, column_kind, column_value)
    239                 id_and_sort_column = [_f for _f in [column_id, column_sort] if _f]
    240                 kind_to_df_map = {key: df_or_dict[[key] + id_and_sort_column].copy().rename(columns={key: "_value"})
--> 241                                   for key in df_or_dict.columns if key not in id_and_sort_column}
    242 
    243                 #todo: is this the right check?

TypeError: can only concatenate list (not "filter") to list

.

When using with column_value="a" you can get around this error but now we get some numexpr errors:

/opt/conda/lib/python3.5/site-packages/pandas/computation/expressions.py:181: UserWarning: evaluating in Python space because the '*' operator is not supported by numexpr for the bool dtype, use '&' instead
  unsupported[op_str]))
/opt/conda/lib/python3.5/site-packages/scipy/signal/spectral.py:772: UserWarning: nperseg = 256, is greater than input length = 15, using nperseg = 15
  'using nperseg = {1:d}'.format(nperseg, x.shape[-1]))

The method assertItemsEqual has been removed from unites.TestCase somewhere along the way to Python 3.5 – we'll need to find a replacement with the same semantics.

@jneuff yes! Semantically they appear to be the same, relating failing tests pass \o/

However, fixing that now just letting the next unittest.assertEqual issue raise its head, it seems that assertEqual has changed in py3 as well, that may go a bit deeper :( One step at a time :)

assertEqual change

Current debug

        # Preserve old features
>       self.assertEqual(list(X_transformed.columns), ["feature_1", "a__length", "b__length"])
E       AssertionError: Lists differ: ['feature_1', 'b__length', 'a__length'] != ['feature_1', 'a__length', 'b__length']
E
E       First differing element 1:
E       'b__length'
E       'a__length'
E
E       - ['feature_1', 'b__length', 'a__length']
E       + ['feature_1', 'a__length', 'b__length']

tests/transformers/test_feature_augmenter.py:50: AssertionError

Used in quite a few places - https://github.com/blue-yonder/tsfresh/search?q=assertEqual&type=Code and further to that it must be kept in mind that with tests with 2 elements, this could pass sometimes if any elements were returned in differing order each time.

E       First differing element 0:
E       'b'
E       'a'
E
E       - ['b', 'a']
E       + ['a', 'b']

I rewrote those unittests with the six framework.

Some of the unit tests still failed, the reason for that was the bug in #29 . I fixed that. Now you should be able to enjoy your fresh features under python3 :)