[3.13] Python compile recursion error due to huge expression

Question

[3.13] Python compile recursion error due to huge expression

williamwen42 opened this issue 2 months ago · comments

https://github.com/sympy/sympy/blob/ff21452ba4ab73952bdf90674d9f8ae9c93a006f/sympy/polys/numberfields/resolvent_lookup.py contains a huge expression that causes a Python compile recursion error in CPython 3.13 debug build when we try to import/AST parse the file. This error doesn't happen on release 3.13 since the compile recursion limit is higher.

Some possible workaround ideas I have for this are:

make importing this file optional
break down the really long expression into multiple expressions
Have another version of this file for python debug build

Repro:

import ast

filename = "/data/users/williamwen/py313-debug-env/lib/python3.13/site-packages/sympy/polys/numberfields/resolvent_lookup.py"
with open(filename, "r") as f:
    contents = f.read()

result = ast.parse(contents)

Log:

Traceback (most recent call last):
  File "/data/users/williamwen/pytorch2/playground4.py", line 7, in <module>
    result = ast.parse(contents)
  File "/data/users/williamwen/installs/python3.13/debug/install/lib/python3.13/ast.py", line 54, in parse
    return compile(source, filename, mode, flags,
                   _feature_version=feature_version, optimize=optimize)
RecursionError: maximum recursion depth exceeded during compilation

Import repro command: python /path/to/sympy/polys/numberfields/resolvent_lookup.py

William Wen · Answer 1 · Tue Jun 11 2024 05:52:40 GMT+0800 (China Standard Time)

cc @albanD

Aaron Meurer · Answer 2 · Tue Jun 11 2024 06:15:47 GMT+0800 (China Standard Time)

If we can lazy import this file that might be the best solution, although I'm not clear if that sort of thing would work since this error would happen at compile time.

I wonder if there is a simple trick we can use to make the expression smaller without changing it, like factoring out a term.

Oscar Benjamin · Answer 3 · Tue Jun 11 2024 06:39:52 GMT+0800 (China Standard Time)

See also nedbat/coveragepy#1774

I consider this to be a bug in the ast module. It should be able to parse valid Python code.

Oscar Benjamin · Answer 4 · Tue Jun 11 2024 06:41:01 GMT+0800 (China Standard Time)

If we can lazy import this file that might be the best solution

It's already sort of lazily imported because at import time it just creates a load of lambda functions rather than actually creating the expressions.

Aaron Meurer · Answer 5 · Tue Jun 11 2024 06:47:42 GMT+0800 (China Standard Time)

It sounds like this normally works, but it's an issue in a debug Python which has a lower recursion limit. I didn't know that that was the case. Depending on how low the recursion limit is, it could prevent other SymPy operations from working as well, since many SymPy functions are recursive.

Given that this is happening in other situations too, though, if a simple fix is possible, like my suggestion to factor some terms in the expression, we should do that.

I suppose we could also split the expression like

...
    lambda s1, s2, s3, s4, s5, s6: _expr1() + _expr2()
...


def _expr1():
    # first half of the expression

def _expr2():
    # second half of the expression

Oscar Benjamin · Answer 6 · Tue Jun 11 2024 18:23:41 GMT+0800 (China Standard Time)

It sounds like this normally works, but it's an issue in a debug Python which has a lower recursion limit. I didn't know that that was the case. Depending on how low the recursion limit is, it could prevent other SymPy operations from working as well, since many SymPy functions are recursive.

Maybe. The ast representation suffers particularly badly from being recursive because it does not flatten arithmetic expressions e.g. a*b*c is represented as (a*b)*c:

In [16]: import ast
    ...: contents = "x = " + " * ".join(["1"] * 3000)
    ...: result = ast.parse(contents)
---------------------------------------------------------------------------
RecursionError

The ast module should have a way to parse that does not have this limitation.

if a simple fix is possible, like my suggestion to factor some terms in the expression, we should do that.

There will always be some setting of the recursion limit that fails. Instead we should implement things in such a way that the recursion limit does not matter.

albanD · Answer 7 · Tue Jun 11 2024 21:06:44 GMT+0800 (China Standard Time)

There will always be some setting of the recursion limit that fails.

Note that in this case, even increasing the limit via sys.setrecursionlimit(1_000_000_000) leads to the same error. Which sounds like something might off here given that the default value (even on release cpython versions) is 1_000.

Oscar Benjamin · Answer 8 · Tue Jun 11 2024 21:25:57 GMT+0800 (China Standard Time)

I don't have a debug build for testing but with a release build I can set the recursion limit very low and it still succeeds:

In [1]: import sys

In [2]: sys.setrecursionlimit(100)

In [3]: import ast
   ...:
   ...: filename = "sympy/polys/numberfields/resolvent_lookup.py"
   ...: with open(filename, "r") as f:
   ...:     contents = f.read()
   ...:
   ...: result = ast.parse(contents)

The problem must be something else then but still it is a bug in the ast module. I don't think that we should change the code in sympy to work around this bug: the bug should be fixed in the proper place.

William Wen · Answer 9 · Wed Jun 12 2024 01:27:42 GMT+0800 (China Standard Time)

The limit is here: https://github.com/python/cpython/blob/51bcb67405cceee1f18067fb2ae510dec47191bc/Include/cpython/pystate.h#L199

Aaron Meurer · Answer 10 · Wed Jun 12 2024 03:59:55 GMT+0800 (China Standard Time)

Well pragmatically if multiple people are hitting this and it's easy to workaround, we should just do that. Of course, I agree it should be fixed upstream too.

Ehren Metcalfe · Answer 11 · Thu Jun 13 2024 00:21:58 GMT+0800 (China Standard Time)

One gotcha: on Windows you can only change the limit effectively for newly spawned threads: https://stackoverflow.com/questions/2917210/what-is-the-hard-recursion-limit-for-linux-mac-and-windows/2918118#2918118

Oscar Benjamin · Answer 12 · Thu Jun 13 2024 01:49:49 GMT+0800 (China Standard Time)

We could use a delayed import:

diff --git a/sympy/polys/numberfields/galois_resolvents.py b/sympy/polys/numberfields/galois_resolvents.py
index f51781585a..5d73b56870 100644
--- a/sympy/polys/numberfields/galois_resolvents.py
+++ b/sympy/polys/numberfields/galois_resolvents.py
@@ -25,7 +25,6 @@
 from sympy.core.symbol import symbols, Dummy
 from sympy.polys.densetools import dup_eval
 from sympy.polys.domains import ZZ
-from sympy.polys.numberfields.resolvent_lookup import resolvent_coeff_lambdas
 from sympy.polys.orderings import lex
 from sympy.polys.polyroots import preprocess_roots
 from sympy.polys.polytools import Poly
@@ -659,6 +658,7 @@ def get_resolvent_by_lookup(T, number):
     dup
 
     """
+    from sympy.polys.numberfields.resolvent_lookup import resolvent_coeff_lambdas
     degree = T.degree()
     L = resolvent_coeff_lambdas[(degree, number)]
     T_coeffs = T.rep.to_list()[1:]

The bug should be fixed properly though because the expressions are not unreasonable.