bigscience-workshop / promptsource

Toolkit for creating, sharing and using natural language prompts.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible to access other examples when prompting?

rbawden opened this issue · comments

Hi there!

I would like to know if it is possible when prompting to add access to the other examples in the dataset, for example by having access to a super() and then indexing the other examples.

The use case: I would like to construct a few-shot example directly in promptsource using a random example from elsewhere in the dataset. However, I would like the random example to use different attributes from the current example. This differs from the current way few-shot examples are created in eval-harness.

Illustration:
I am looking at the gsarti/flores-101 dataset, for which each example is multi-parallel, with one attribute per language, e.g.

{
"id":"int32"
"sentence_afr":"string"
"sentence_amh":"string"
"sentence_ara":"string"
"sentence_eng":"string
...
}

I would like to construct examples such that:

Arabic: {{ sentence_ara }} = English: ||| {{ sentence_eng }}

is the main template for the example, but the example used first (as 1-shot) is as follows:

French: {{ sentence_fra }} = English: ||| {{ sentence_amh }}

but where sentence_fra and sentence_amh in the second instance come from a different example.

Is this possible or it is something for eval-harness?

Hi @rbawden,
There is no explicit support unfortunately for the case you are describing.
I think the easiest thing you can do is separately prompt the dataset and glue the instances together yourself. Something along the lines of:

prompted_shot_dataset = dataset_to_select_shots_from.map(prompt)

for example in dataset_to_eval:
    indexes = ... # list of indices of the shots to prepend
    input = " ".join(prompted_shot_dataset[indexes]["input"]) + example["input"]
    target = example["target"]

I haven't touched eval-harness in a while so won't be able to advise on this side.

Ok, thanks @VictorSanh, I'll look into this!