Hilbert unbelievably slow when len(time) is not a hamming number

Question

Hilbert unbelievably slow when len(time) is not a hamming number

SynapticSage opened this issue 5 years ago · comments

I have a bunch of animals I sent through tensorpac (hilbert instead of wavelet), and all of their data are nearly the same length, but strangely enough, some took 10 minutes for tensorpac to crunch, some took hours or days. It turns out to boil down to an annoying feature of the hiblert method. If time is not a Hamming number, it takes 3000x longer to run. So here's my fix in spectral.py:

def hilbert_fast(x, axis): 
    ''' If x not factorizable by (2,3,5), takes like 3000x longer '''
     from scipy import fftpack
     if isinstance(x, (list,tuple)):
         x = np.array(x) # concat iterables
     xd = hilbert(x, axis=axis, N=fftpack.next_fast_len(x.shape[axis])) # pad to a hamming number
     xd = xd.swapaxes(0,axis) # swap such that time along 0-axis
     xd = xd[:x.shape[axis]].swapaxes(0,axis) # cut out the non-padded portion and return the axes to their positions
    return  xd

And then you sub the hilbert call with the following:
xd = hilbert_fast(xf, axis=axis + 1) if stype is not None else np.array(xf)

Night and day speed difference for several of my animals.

Etienne Combrisson · Answer 1 · Thu Jun 13 2019 00:55:21 GMT+0800 (China Standard Time)

Hi @SynapticSage ,

Thanks for reporting this issue. Yes, scipy implementation of the Hilbert transform might be slow depending on the input signal length. But I'm not sure that it's because the length is not a Hamming number, but simply because it depends if it's odd or not (see this issue).

For an other package, I have this small fix :

# Bandpass filter
data_filt = filt(sf, [fmin, fmax], data, order=4)
if data.size % 2:
    analytic = hilbert(data_filt)
else:
    analytic = hilbert(data_filt[:-1], len(data_filt))

Might be worth trying

Ryan Young · Answer 2 · Thu Jun 13 2019 02:20:53 GMT+0800 (China Standard Time)

Cool, I appreciate the reference.

So let me briefly show why it's probably a Hamming number not just the even numbers (the reason your solution works is that evens are a subset of Hamming).

To demonstrate, here are some processing times for N that's factorizable by 5:

[nav] In [6]: %%timeit
         ...: x = np.random.rand(100015,)
         ...: y = hilbert(x)

19.9 ms ± 572 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

by 2 by not by 5:

[nav] In [7]: %%timeit
         ...: x = np.random.rand(100014,)
         ...: y = hilbert(x)

18.1 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

and neither by 5 nor 2:

[nav] In [15]: %%timeit
          ...: x = np.random.rand(100019,)
          ...: y = hilbert(x)

26.9 s ± 90.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As you can see, factorizable by only 5 has nearly the same run time as factorizable by 2. But when it's factorizable by neigther, it takes 29 seconds. Why is this?
Internally, as you know, scipy.signal.hilbert calls fftpack.fft; if you see this scipy documentation for fftpack.next_fast_len which applies to all functions that call scipy.signal.fft:

Find the next fast size of input data to fft, for zero-padding, etc.
SciPy's FFTPACK has efficient functions for radix {2, 3, 4, 5}, so this
returns the next composite of the prime factors 2, 3, and 5 which is
greater than or equal to target. (These are also known as 5-smooth
numbers, regular numbers, or Hamming numbers.)

Ryan Young · Answer 3 · Thu Jun 13 2019 02:22:23 GMT+0800 (China Standard Time)

Anyhow, beyond that, I would say code clarity-wise, your solution is MUCH cleaner. So, I'll probably replace mine with that. Negligably small speed gains for calling next_fast_len. (But I think it's almost surely hamming, not just the evens). Take care!

Etienne Combrisson · Answer 4 · Thu Jun 13 2019 15:11:42 GMT+0800 (China Standard Time)

Indeed, it's very surprising the difference between 100015 and 100019. May be that's something you should report to scipy directly?

Anyway, i'll make the 'fix' asap. Thanks for your issue, very interesting results !

Etienne Combrisson · Answer 5 · Mon Jun 17 2019 20:08:56 GMT+0800 (China Standard Time)

@SynapticSage actually you was right, I thought that only looking at if len(x) % 2 was sufficient, but it's not. The last line of the issue I shared seems to solve it : hilbert3 = lambda x: signal.hilbert(x, fftpack.next_fast_len(len(x)))[:len(x)]

Etienne Combrisson · Answer 6 · Mon Jun 17 2019 20:13:08 GMT+0800 (China Standard Time)

@SynapticSage I made a small fix (573280d), I want to be sure that this is correct. Do you think you can test it if it solve the issue on your data please? (actually, the fix is the method you originally proposed !)

Ryan Young · Answer 7 · Tue Jun 18 2019 03:28:58 GMT+0800 (China Standard Time)

Sure thing. I'll give it a go when I have a sec, and report back.

Ryan Young · Answer 8 · Thu Jun 20 2019 03:19:10 GMT+0800 (China Standard Time)

Solves the issue: Code takes minutes instead of days. Thanks!