Add an infilling DSL

Question

Add an infilling DSL

rlouf opened this issue a year ago · comments

Workflow with LLMs often imply recursively calling models where at each step we concatenate the result of the previous call to the prompt. Consider the following example:

@text.prompt
def determine_goal(question):
    """{{question}}

    In order to solve this problem, we will analyze each of the options and determine
    """

@text.prompt
def solve(memory):
    """{{memory}}. Let's begin."""

complete = models.text_completion.openai(model_name)

prompt = determine_goal(question)
answer = complete(prompt, stop_at=["."])
prompt = solve(prompt + answer)
answer = complete(prompt, stop_at=["."])
completed = prompt + answer

The fact that we have to do the concatenation manually can quickly become cumbersome. Worse, we loose the KV cache across generations. I thus suggest to add a thin infilling DSL to outlines with the new API

import outlines.text as text

@text.infilling
def answer(question, goal, answer):
    """{{ question }}

    In order to solve this problem, we will analyze all of the options and determine [[ goal ]].
   
    Let's begin. [[ answer ]]."""


model = models.transformers("gpt2")
continuation = generate.continuation(model)
result = answer("Where are Apple's headquartes located?", continuation, continuation)
print(result["answer"])

A few quick points:

We first evaluate the template with Jinja2 and then decompose the rendered template into a succession of strings and model calls, and loop over them;
Calling the function returns a dictionary indexed by the names;
We infer the stop_at kwarg from the template;
This needs to be vectorized (for ToT implementation for instance)

neurallambda · Answer 1 · Sat Sep 30 2023 12:21:22 GMT+0800 (China Standard Time)

I'll tie in one other must-have feature, imo: #305

This is where each infilled generation is accessible via a key, at the end of generation, like in guidance:

prompt = '''
Here's one sentence: {{gen "SENTENCE_1"}}
Now, another sentence: {{gen "SENTENCE_2"}}
'''
out = guidance(prompt)()

print(out['SENTENCE_1'])
# <prints what the LLM generated>

print(out['SENTENCE_2'])
# <prints the other sentence>

As a separate question, could we optionally return, and reuse a kv-cache? This would make your hand-rolled-concatenation example mostly work, already, aside from the nice syntactic sugar you suggest above.

Rémi Louf · Answer 2 · Sat Sep 30 2023 22:51:13 GMT+0800 (China Standard Time)

As a separate question, could we optionally return, and reuse a kv-cache? This would make your hand-rolled-concatenation example mostly work, already, aside from the nice syntactic sugar you suggest above.

We opened #190 to discuss this. Does the interface there make sense? This is indeed an important feature that we'll re-prioritise.

Rémi Louf · Answer 3 · Fri Feb 16 2024 16:35:43 GMT+0800 (China Standard Time)

DSLs generally suck, so closing this in favor of #667 which would provide the same functionalities and more.