Python 3 support
jneuff opened this issue · comments
Currently we only support Python 2. In future releases we want to support both Python 2 and 3. This howto outlines the main steps towards Python 3 support.
see #26
Bummer. Thanks for the reply.
sooner or later we will have that python3 support :)
until then, you could extract the features with a local python2.7 interpreter, pickle the dataframe and then load them into your python3.5 project
I will look into this later
I just uploaded the branch "i8_add_python3_support"
on it, I started to make tsfresh runable under python3. Now, all unit tests are passing on python 2.7. On Python 3.5.1, 14 unit tests are failing.
Maybe I will have time during the next days to finish the job. Otherwise it would be nice if somebody else could check the changes and getting that unit tests to pass.
I will take a look. I have to do it for Skyline at some point and I really want to deep dive into what you are up to here, so it may be an effective method for me to start a Python 3 path in my own sphere and get a handle on how you do not run into some of the clustering issues relating to timeseries as with k-means et al.
@jneuff I have read the paper now and dug a bit deep and I now understand a little more :) I should be say hey TPOT -> tsFRESH :)
@MaxBenChrist anybody interested in having a go at porting any bits and pieces to Python 3 can use Python 3.5.2 (latest) unless there is a reason that Python 3.5.1 is required, which silence on the matter shall be read as py352_ok = True
, I am sure you are busy
Nice of blue-yonder and you all to release it, timeseries and ml not being easy and all, this looks like a step :)
hi @earthgecko
we are happy about anybody that wants to contribute. You could take my "i8_add_python3_support" branch as a starting point.
Where do one find this py352_ok = True
flag? I am not familiar with it.
Bytheway, to what are you referring with TPOT ? :)
Max
I have your i8_add_python3_support branch and I am working on that. Any changes
I will pull small increments on that branch for you.
A question concerning about how to handle Python 3 builtins in a backwards
compatible manner? For example the use of builtins in tsfresh/feature_selection/feature_selector.py
in the i8_add_python3_support branch is not backwards compatible with 2.7.x as
it stands now as there is no builtins in 2.7 and this has ramifactions through
other modules.
I shall add some additional detailed info on #30 for consideration.
There is no flag, it was a question :) You are OK with using 3.5.2, there is no specific reason you are using 3.5.1?
TPOT - https://github.com/rhiever/tpot - I initially thought that tsfresh was doing a subset of what TPOT does, but no TPOT could probably add a FRESH dimension :)
Now down 5 failing unit tests from 14
The outstanding ones are mostly related to objects have no attribute 'assertItemsEqual' in a number of contexts and there is a pandas errors related to:
pandas/computation/expressions.py:182: UserWarning: evaluating in Python space because the '*' operator is not supported by numexpr for the bool dtype, use '&' instead
In tests/transformers/test_full_pipeline.py along with an AssertionError too, they may be related
> self.assertTrue(some_expected_features.issubset(set(extracted_features.columns)))
E AssertionError: False is not true
Some info on blocking points (was playing with the Python3 branch but unfortunately have no time to go into depth or create a fix myself right now):
The first Quickstart example
extracted_features = extract_features(timeseries, column_id="id", column_sort="time")
yields:
TypeError Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/tsfresh/utilities/dataframe_functions.py in normalize_input_to_internal_representation(df_or_dict, column_id, column_sort, column_kind, column_value)
239 id_and_sort_column = [_f for _f in [column_id, column_sort] if _f]
240 kind_to_df_map = {key: df_or_dict[[key] + id_and_sort_column].copy().rename(columns={key: "_value"})
--> 241 for key in df_or_dict.columns if key not in id_and_sort_column}
242
243 #todo: is this the right check?
TypeError: can only concatenate list (not "filter") to list
.
When using with column_value="a"
you can get around this error but now we get some numexpr errors:
/opt/conda/lib/python3.5/site-packages/pandas/computation/expressions.py:181: UserWarning: evaluating in Python space because the '*' operator is not supported by numexpr for the bool dtype, use '&' instead
unsupported[op_str]))
/opt/conda/lib/python3.5/site-packages/scipy/signal/spectral.py:772: UserWarning: nperseg = 256, is greater than input length = 15, using nperseg = 15
'using nperseg = {1:d}'.format(nperseg, x.shape[-1]))
The current py2 py3 tests state in a gist - https://gist.github.com/earthgecko/118d168f88ebb37661154e3cb898c1fb
The method assertItemsEqual has been removed from unites.TestCase somewhere along the way to Python 3.5 – we'll need to find a replacement with the same semantics.
@jneuff yes! Semantically they appear to be the same, relating failing tests pass \o/
However, fixing that now just letting the next unittest.assertEqual issue raise its head, it seems that assertEqual has changed in py3 as well, that may go a bit deeper :( One step at a time :)
assertEqual change
Current debug
# Preserve old features
> self.assertEqual(list(X_transformed.columns), ["feature_1", "a__length", "b__length"])
E AssertionError: Lists differ: ['feature_1', 'b__length', 'a__length'] != ['feature_1', 'a__length', 'b__length']
E
E First differing element 1:
E 'b__length'
E 'a__length'
E
E - ['feature_1', 'b__length', 'a__length']
E + ['feature_1', 'a__length', 'b__length']
tests/transformers/test_feature_augmenter.py:50: AssertionError
Used in quite a few places - https://github.com/blue-yonder/tsfresh/search?q=assertEqual&type=Code and further to that it must be kept in mind that with tests with 2 elements, this could pass sometimes if any elements were returned in differing order each time.
E First differing element 0:
E 'b'
E 'a'
E
E - ['b', 'a']
E + ['a', 'b']
I rewrote those unittests with the six framework.
Some of the unit tests still failed, the reason for that was the bug in #29 . I fixed that. Now you should be able to enjoy your fresh features under python3 :)