EtienneCmb / tensorpac

Phase-Amplitude Coupling under Python

Home Page:https://etiennecmb.github.io/tensorpac/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hilbert unbelievably slow when len(time) is not a hamming number

SynapticSage opened this issue · comments

I have a bunch of animals I sent through tensorpac (hilbert instead of wavelet), and all of their data are nearly the same length, but strangely enough, some took 10 minutes for tensorpac to crunch, some took hours or days. It turns out to boil down to an annoying feature of the hiblert method. If time is not a Hamming number, it takes 3000x longer to run. So here's my fix in spectral.py:

def hilbert_fast(x, axis): 
    ''' If x not factorizable by (2,3,5), takes like 3000x longer '''
     from scipy import fftpack
     if isinstance(x, (list,tuple)):
         x = np.array(x) # concat iterables
     xd = hilbert(x, axis=axis, N=fftpack.next_fast_len(x.shape[axis])) # pad to a hamming number
     xd = xd.swapaxes(0,axis) # swap such that time along 0-axis
     xd = xd[:x.shape[axis]].swapaxes(0,axis) # cut out the non-padded portion and return the axes to their positions
    return  xd

And then you sub the hilbert call with the following:
xd = hilbert_fast(xf, axis=axis + 1) if stype is not None else np.array(xf)

Night and day speed difference for several of my animals.

Hi @SynapticSage ,

Thanks for reporting this issue. Yes, scipy implementation of the Hilbert transform might be slow depending on the input signal length. But I'm not sure that it's because the length is not a Hamming number, but simply because it depends if it's odd or not (see this issue).

For an other package, I have this small fix :

# Bandpass filter
data_filt = filt(sf, [fmin, fmax], data, order=4)
if data.size % 2:
    analytic = hilbert(data_filt)
else:
    analytic = hilbert(data_filt[:-1], len(data_filt))

Might be worth trying

Cool, I appreciate the reference.

So let me briefly show why it's probably a Hamming number not just the even numbers (the reason your solution works is that evens are a subset of Hamming).

To demonstrate, here are some processing times for N that's factorizable by 5:

[nav] In [6]: %%timeit
         ...: x = np.random.rand(100015,)
         ...: y = hilbert(x)

19.9 ms ± 572 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

by 2 by not by 5:

[nav] In [7]: %%timeit
         ...: x = np.random.rand(100014,)
         ...: y = hilbert(x)

18.1 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

and neither by 5 nor 2:

[nav] In [15]: %%timeit
          ...: x = np.random.rand(100019,)
          ...: y = hilbert(x)

26.9 s ± 90.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As you can see, factorizable by only 5 has nearly the same run time as factorizable by 2. But when it's factorizable by neigther, it takes 29 seconds. Why is this?
Internally, as you know, scipy.signal.hilbert calls fftpack.fft; if you see this scipy documentation for fftpack.next_fast_len which applies to all functions that call scipy.signal.fft:

Find the next fast size of input data to fft, for zero-padding, etc.
SciPy's FFTPACK has efficient functions for radix {2, 3, 4, 5}, so this
returns the next composite of the prime factors 2, 3, and 5 which is
greater than or equal to target. (These are also known as 5-smooth
numbers, regular numbers, or Hamming numbers.)

Anyhow, beyond that, I would say code clarity-wise, your solution is MUCH cleaner. So, I'll probably replace mine with that. Negligably small speed gains for calling next_fast_len. (But I think it's almost surely hamming, not just the evens). Take care!

Indeed, it's very surprising the difference between 100015 and 100019. May be that's something you should report to scipy directly?

Anyway, i'll make the 'fix' asap. Thanks for your issue, very interesting results !

@SynapticSage actually you was right, I thought that only looking at if len(x) % 2 was sufficient, but it's not. The last line of the issue I shared seems to solve it : hilbert3 = lambda x: signal.hilbert(x, fftpack.next_fast_len(len(x)))[:len(x)]

@SynapticSage I made a small fix (573280d), I want to be sure that this is correct. Do you think you can test it if it solve the issue on your data please? (actually, the fix is the method you originally proposed !)

Sure thing. I'll give it a go when I have a sec, and report back.

Solves the issue: Code takes minutes instead of days. Thanks!