guidance-ai / guidance

A guidance language for controlling large language models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dedent=True doesn't work in python 3.12

mmoskal opened this issue · comments

The bug

The dedend=True argument to @guidance decorator seems to work in 3.10 and 3.11 but not in 3.12.

To Reproduce

The test below passes in 3.11 but not 3.12 on macOS x86.

import guidance

@guidance(stateless=True, dedent=True)
def character_maker2(lm):
    lm += f"""\
    {{
        "name": "{guidance.gen('name', stop='"', max_tokens=1)}",
    }}"""
    return lm

def test():
    lm = guidance.models._mock.Mock()
    lm += character_maker2()
    assert str(lm).startswith("{")

test()

System info (please complete the following information):

  • OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): macOS x86
  • Guidance Version (guidance.__version__): 0.1.15

I've noticed the dedent kwarg but didn't previously know what it was doing... Essentially we are introspecting on the source code and rewriting it as if the user had written textwrap.dedent on their block strings?

Do you know if this behavior is documented anywhere? Is it important to maintain? @Harsha-Nori any opinions/thoughts?

Python 3.12 had major changes to f-string behavior, so it doesn't surprise me that dedent is acting up now. Thanks for catching this!

@hudson-ai I don't know that we ever documented it, but the idea is to follow the same semantics as textwrap.dedent (ignore leading spaces) so that you can do natural in-line f-strings without having to undo all the indentation if you're in an already nested python block. If we don't have this, the f-string syntax for writing guidance programs gets quite ugly since every line after the first needs to be on the 0-indent level, eg:

def prog(lm):
    if blah:
        lm += f"""first line

second line
third line"""
    return lm

Understood. That is indeed nice to have...

Just messing around a little with this (on Windows.... same behaviour), and the call to dedent happens here:

source = textwrap.dedent(inspect.getsource(f))

The result on Python 3.11 is

@guidance(stateless=True, dedent=True)
def character_maker2(lm):
    lm += f"""\
    {{
        "name": "{guidance.gen('name', stop='"', max_tokens=1)}",
    }}"""
    return lm

and on Python 3.12

@guidance(stateless=True, dedent=True)
def character_maker2(lm):
    lm += f"""\
    {{
        "name": "{guidance.gen('name', stop='"', max_tokens=1)}",
    }}"""
    return lm

which looks rather similar.

And continued prodding, this is definitely to do with something changing in how f-strings are handled, rather than dedent() per se.

In Python 3.12, the next portion of strip_multiline_string_indents() builds an AST. Judicious application of print(ast.dump(old_ast, indent=4)) yields:

Module(
    body=[
        FunctionDef(
            name='character_maker2',
            args=arguments(
                posonlyargs=[],
                args=[
                    arg(arg='lm')],
                kwonlyargs=[],
                kw_defaults=[],
                defaults=[]),
            body=[
                AugAssign(
                    target=Name(id='lm', ctx=Store()),
                    op=Add(),
                    value=JoinedStr(
                        values=[
                            Constant(value='    {\n        "name": "'),
                            FormattedValue(
                                value=Call(
                                    func=Attribute(
                                        value=Name(id='guidance', ctx=Load()),
                                        attr='gen',
                                        ctx=Load()),
                                    args=[
                                        Constant(value='name')],
                                    keywords=[
                                        keyword(
                                            arg='stop',
                                            value=Constant(value='"')),
                                        keyword(
                                            arg='max_tokens',
                                            value=Constant(value=1))]),
                                conversion=-1),
                            Constant(value='",\n    }')])),
                Return(
                    value=Name(id='lm', ctx=Load()))],
            decorator_list=[],
            type_params=[])],
    type_ignores=[])

Note in particular the line Constant(value=' {\n "name": "') Shifting back to Python 3.11, this line becomes Constant(value='{\n "name": "'), which lacks the leading spaces.