ipython / ipython

Hey there!

I've discovered a bug with the tokenizer and automatic forward-slash-parenthesization.

Specifically, the following will result in an error when run in IPython 8.25.0 on Python 3.12.3:

1| from pathlib import Path
2| 
3| (
4|     Path(".")
5|     / f")"
6|     / "a a a a a a a a a"
7| )

Interestingly, the issue is mitigated if the f-string on line 5 is:

removed/replaced by a plain string
has any character after the ending parenthesis
starts with an opening parenthesis (note if there's anything before, including a space, it'll fail)

From a little digging, the tokenizer starts a new line (in tokens_by_line) when it encounters the new line at the end of the f-string.
It looks like the end parenthesis in the f-string is becoming an FSTRING_MIDDLE token, and deincrementing parenlev.

See here

ipython/IPython/core/inputtransformer2.py

Lines 511 to 556 in 1b4607f

    
           def make_tokens_by_line(lines:List[str]): 
        
               """Tokenize a series of lines and group tokens by line. 
        
               The tokens for a multiline Python string or expression are grouped as one 
        
               line. All lines except the last lines should keep their line ending ('\\n', 
        
               '\\r\\n') for this to properly work. Use `.splitlines(keeplineending=True)` 
        
               for example when passing block of text to this function. 
        
               """ 
        
               # NL tokens are used inside multiline expressions, but also after blank 
        
               # lines or comments. This is intentional - see https://bugs.python.org/issue17061 
        
               # We want to group the former case together but split the latter, so we 
        
               # track parentheses level, similar to the internals of tokenize. 
        
               #   reexported from token on 3.7+ 
        
               NEWLINE, NL = tokenize.NEWLINE, tokenize.NL  # type: ignore 
        
               tokens_by_line: List[List[Any]] = [[]] 
        
               if len(lines) > 1 and not lines[0].endswith(("\n", "\r", "\r\n", "\x0b", "\x0c")): 
        
                   warnings.warn( 
        
                       "`make_tokens_by_line` received a list of lines which do not have lineending markers ('\\n', '\\r', '\\r\\n', '\\x0b', '\\x0c'), behavior will be unspecified", 
        
                       stacklevel=2, 
        
                   ) 
        
               parenlev = 0 
        
               try: 
        
                   for token in tokenutil.generate_tokens_catch_errors( 
        
                       iter(lines).__next__, extra_errors_to_catch=["expected EOF"] 
        
                   ): 
        
                       tokens_by_line[-1].append(token) 
        
                       if (token.type == NEWLINE) \ 
        
                               or ((token.type == NL) and (parenlev <= 0)): 
        
                           tokens_by_line.append([]) 
        
                       elif token.string in {'(', '[', '{'}: 
        
                           parenlev += 1 
        
                       elif token.string in {')', ']', '}'}: 
        
                           if parenlev > 0: 
        
                               parenlev -= 1 
        
               except tokenize.TokenError: 
        
                   # Input ended in a multiline string or expression. That's OK for us. 
        
                   pass 
        
               if not tokens_by_line[-1]: 
        
                   tokens_by_line.pop() 
        
               return tokens_by_line

This issue does not occur on an older version of Python (e.g., 3.11.x), even when running the latest version of IPython.

thanks for the report, i'll see what I can do.

	def make_tokens_by_line(lines:List[str]):
	"""Tokenize a series of lines and group tokens by line.

	The tokens for a multiline Python string or expression are grouped as one
	line. All lines except the last lines should keep their line ending ('\\n',
	'\\r\\n') for this to properly work. Use `.splitlines(keeplineending=True)`
	for example when passing block of text to this function.

	"""
	# NL tokens are used inside multiline expressions, but also after blank
	# lines or comments. This is intentional - see https://bugs.python.org/issue17061
	# We want to group the former case together but split the latter, so we
	# track parentheses level, similar to the internals of tokenize.

	# reexported from token on 3.7+
	NEWLINE, NL = tokenize.NEWLINE, tokenize.NL # type: ignore
	tokens_by_line: List[List[Any]] = [[]]
	if len(lines) > 1 and not lines[0].endswith(("\n", "\r", "\r\n", "\x0b", "\x0c")):
	warnings.warn(
	"`make_tokens_by_line` received a list of lines which do not have lineending markers ('\\n', '\\r', '\\r\\n', '\\x0b', '\\x0c'), behavior will be unspecified",
	stacklevel=2,
	)
	parenlev = 0
	try:
	for token in tokenutil.generate_tokens_catch_errors(
	iter(lines).__next__, extra_errors_to_catch=["expected EOF"]
	):
	tokens_by_line[-1].append(token)
	if (token.type == NEWLINE) \
	or ((token.type == NL) and (parenlev <= 0)):
	tokens_by_line.append([])
	elif token.string in {'(', '[', '{'}:
	parenlev += 1
	elif token.string in {')', ']', '}'}:
	if parenlev > 0:
	parenlev -= 1
	except tokenize.TokenError:
	# Input ended in a multiline string or expression. That's OK for us.
	pass


	if not tokens_by_line[-1]:
	tokens_by_line.pop()


	return tokens_by_line

Bug in Tokenizer/Automatic Parenthesization for Python 3.12