Add an infilling DSL
rlouf opened this issue · comments
Workflow with LLMs often imply recursively calling models where at each step we concatenate the result of the previous call to the prompt. Consider the following example:
@text.prompt
def determine_goal(question):
"""{{question}}
In order to solve this problem, we will analyze each of the options and determine
"""
@text.prompt
def solve(memory):
"""{{memory}}. Let's begin."""
complete = models.text_completion.openai(model_name)
prompt = determine_goal(question)
answer = complete(prompt, stop_at=["."])
prompt = solve(prompt + answer)
answer = complete(prompt, stop_at=["."])
completed = prompt + answer
The fact that we have to do the concatenation manually can quickly become cumbersome. Worse, we loose the KV cache across generations. I thus suggest to add a thin infilling DSL to outlines with the new API
import outlines.text as text
@text.infilling
def answer(question, goal, answer):
"""{{ question }}
In order to solve this problem, we will analyze all of the options and determine [[ goal ]].
Let's begin. [[ answer ]]."""
model = models.transformers("gpt2")
continuation = generate.continuation(model)
result = answer("Where are Apple's headquartes located?", continuation, continuation)
print(result["answer"])
A few quick points:
- We first evaluate the template with Jinja2 and then decompose the rendered template into a succession of strings and model calls, and loop over them;
- Calling the function returns a dictionary indexed by the names;
- We infer the
stop_at
kwarg from the template; - This needs to be vectorized (for ToT implementation for instance)
I'll tie in one other must-have feature, imo: #305
This is where each infilled generation is accessible via a key, at the end of generation, like in guidance
:
prompt = '''
Here's one sentence: {{gen "SENTENCE_1"}}
Now, another sentence: {{gen "SENTENCE_2"}}
'''
out = guidance(prompt)()
print(out['SENTENCE_1'])
# <prints what the LLM generated>
print(out['SENTENCE_2'])
# <prints the other sentence>
As a separate question, could we optionally return, and reuse a kv-cache? This would make your hand-rolled-concatenation example mostly work, already, aside from the nice syntactic sugar you suggest above.
As a separate question, could we optionally return, and reuse a kv-cache? This would make your hand-rolled-concatenation example mostly work, already, aside from the nice syntactic sugar you suggest above.
We opened #190 to discuss this. Does the interface there make sense? This is indeed an important feature that we'll re-prioritise.