salesforce / CodeGen

CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation output

tom-doerr opened this issue · comments

Thank you so much for sharing your work including weights and making it easy to run!

What does the text output of the model mean? I'm getting the completion of my prompt in the first box and then another completion in the second box. Is the second box just the truncated completion of the first box including the prompt?

Example:

(venv) ➜  CodeGen git:(main) ✗ python3 -m jaxformer.hf.sample --model codegen-2B-mono --context "def print_in_quotes(text):" --device cpuloading parameters
loading parameters took 16.33s
loading tokenizer
loading tokenizer took 3.43s
sampling
====================================================================================================

    print('"{}"'.format(text))


def print_in_quotes_with_quotes(text):
    print("\"{}\"".format(text))


def print_in_quotes_with_quotes_and_spaces(text):
    print("\"{}\"".format(text))


def print_in_quotes_with_quotes_and_spaces_and_new_lines(text):
    print("\"{}\"".format(text))


def print_in_quotes_
====================================================================================================
def print_in_quotes(text):
    print('"{}"'.format(text))
====================================================================================================
sampling took 78.96s
done.

First "box" is the completion itself without truncation given the context.

Second "box" is the concatenation of the context and the truncated completion.

See:

print(args.context+truncation)