equinor / segyio

Fast Python library for SEGY files.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generators behaving badly?

da-wad opened this issue · comments

I've just found a surprise in the way the iline generator is functioning.

Given a slice object with only stop specified I would expect this small program to print True twice. But it doesn't...

import numpy as np
import segyio

def test_iline_generator(f, i):
    list_comp = np.asarray([f.iline[i] for i in range(1,i)])
    slicing = np.asarray(list(f.iline[:i:]))
    print(np.array_equal(list_comp, slicing))

f=segyio.open('test-data\\small.sgy')
    
test_iline_generator(f, 3) #True
test_iline_generator(f, 4) #False?!!

No, it's working as expected - slice is a generator, and you're not copying the result before adding it to your list.

>>> def test_iline_generator(f, i):
...     list_comp = np.asarray([f.iline[i] for i in range(1,i)])
...     slicing = np.asarray([a.copy() for a in f.iline[:i:]])
...     print(np.array_equal(list_comp, slicing))
... 
>>> test_iline_generator(f, 3)
True
>>> test_iline_generator(f, 4)

Riiight, specifically a generator which yields mutable values. And the list() constructor doesn't do any copying.

But... the first call to my buggy test_line_generator() returns True. How so? Because you're cycling two buffers in the generator.

Got it. Close?

Yea, segyio knows that the generated value should update every step, so as an optimisation it reuses a pair of underlying objects to allocate only twice for loops of any size. This is a huge performance gain in many common scenarios.

For what it's worth, this is how generators behave for non-segyio objects too. segyio provides collect() as a pre-written asarray(list(generator...)) since this use is so common.