fbaptiste / python-deepdive

Python Deep Dive Course - Accompanying Materials

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The expression ((i * j for j in range(start, stop + 1)) for i in range(start, stop + 1)) will result in wrong values!

Rid1FZ opened this issue · comments

In Part 2 > Section 06 - Generators > 05-Generator Expressions.ipynb at cell number 34, the following expression is used

((i * j for j in range(start, stop + 1)) for i in range(start, stop + 1))

as an equivalent to the following expression

[[i * j for j in range(start, stop + 1)] for i in range(start, stop + 1)]

But they will not generate the same result. The first expression will result in the following functions,

def mult_list():
    global start, stop
    for i in range(start, stop + 1):
        def inner():
            global start, stop
            nonlocal i
            for j in range(start, stop + 1):
                yield i * j

        yield inner()

we can see that the inner function is a closure, and i is the free variable. So the value of i which will be used to create the generator, is 10, which results in the following

[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

But the very first expression [[i * j for j in range(start, stop + 1)] for i in range(start, stop + 1)] results the following

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
 [3, 6, 9, 12, 15, 18, 21, 24, 27, 30],
 [4, 8, 12, 16, 20, 24, 28, 32, 36, 40],
 [5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
 [6, 12, 18, 24, 30, 36, 42, 48, 54, 60],
 [7, 14, 21, 28, 35, 42, 49, 56, 63, 70],
 [8, 16, 24, 32, 40, 48, 56, 64, 72, 80],
 [9, 18, 27, 36, 45, 54, 63, 72, 81, 90],
 [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]]

I was wondering if there is any fix to this!!!

Hi @Rid1-fz-06, Good point.

Whether they generate equivalent result or not depends on how you get the items from the nested generator.

As you mentioned, this:

((i * j for j in range(start, stop + 1)) for i in range(start, stop + 1))

Is equivalent to: (I removed unnecessary global and nonlocal keyword you defined)

def mult_list():
    for i in range(start, stop + 1):
        def inner():
            for j in range(start, stop + 1):
                yield i * j
        yield inner()

Now if you create all generator expressions first, then iterate through them, i is going to be bound to the latest value gotten from range object:

all_gens = [gen for gen in mult_list()]
for g in all_gens:
    print(list(g))

output:

[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

But if you iterate through generators right after you defined them, it will going to generate the same result as your first nested list comprehension:

for gen in mult_list():
    print(list(gen))

output:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
[3, 6, 9, 12, 15, 18, 21, 24, 27, 30]
[4, 8, 12, 16, 20, 24, 28, 32, 36, 40]
[5, 10, 15, 20, 25, 30, 35, 40, 45, 50]
[6, 12, 18, 24, 30, 36, 42, 48, 54, 60]
[7, 14, 21, 28, 35, 42, 49, 56, 63, 70]
[8, 16, 24, 32, 40, 48, 56, 64, 72, 80]
[9, 18, 27, 36, 45, 54, 63, 72, 81, 90]
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

In both cases(nested list comp and nested generator expression), i "is" a free variable but as you can see if you consume the generator expression right after its definition, i has the desired value at that time.

Thanks @Rid1-fz-06 for that observation, and thanks @amirsoroush for that explanation!

Indeed, if you do not iterate the sub generators in sequence too, you can get into that situation.

There is no clean way around that since you are going to deal with closures and cannot control "breaking" that free variable.

If you really need the ability to first generate all the sub generators, and then iterate through them, this way:

all_gens = [gen for gen in mult_list()]
for g in all_gens:
    print(list(g))

then you are going to have to work a bit harder. Something like this will work - basically generate all the values first into a single generator, then use another generator to "split" the results back into the required rows of data:

start = 1
stop = 10
r1 = (
    i * j
    for i, j in product(range(start, stop + 1), range(start, stop + 1))
)

r2 = (
    (next(r1) for _ in range(stop - start + 1))
    for __ in range(stop - start + 1)
)

all_gens = [gen for gen in r2]
for g in all_gens:
    print(list(g))

I would point out though, that using

all_gens = [gen for gen in r2]
for g in all_gens:
    print(list(g))

is kind of missing the point of generators - which is to create iterables that are lazy evaluated. You can make a list of the sub generators, but why? It's not like you can go through each one repeatedly - so unless you are planning on extracting just a subset of the sub generators, I see no reason to do it (and in that case, just have your main generator generate just the subset you want in the first place).