google-research / FLAN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Where can I obtain a generated dataset that includes an options column

nanyyyyyy opened this issue · comments

Where can I obtain a generated dataset that includes an options column, which can be used for rank evaluation purposes? Thank you.

@nanyyyyyy You would need to re-generate it and pass through an options column for relevant datasets. This could cost a bit of compute though. Alternatively you could isolate the options datasets and use a regex to extract them.

Sorry, this data was intended primarily for training so we didn't pass that information along. Hope this helps though!

Can you explain a bit about this? I want to include options and the exact template for generating each instance in the dataset. What are the detailed steps to achieve this?

Can you explain a bit about this? I want to include options and the exact template for generating each instance in the dataset. What are the detailed steps to achieve this?

I haven't figured it out. sorry

@nanyyyyyy @gao-xiao-bai So to generate all the templates and options alongside each example you would need to edit the preprocessors used for every task.

One in particular is the formatter (here) which is what applies the pattern (or "template") to each example. You could create a function like this one to store the pattern as a field, and make sure its passed all the way through to the final generated examples by adding to the list of passthrough fields here.

To get the answer options you would do the same thing, passing through the "options"key in each example, for the datasets that have the format_options preprocessor (see here).

@nanyyyyyy @gao-xiao-bai So to generate all the templates and options alongside each example you would need to edit the preprocessors used for every task.

One in particular is the formatter (here) which is what applies the pattern (or "template") to each example. You could create a function like this one to store the pattern as a field, and make sure its passed all the way through to the final generated examples by adding to the list of passthrough fields here.

To get the answer options you would do the same thing, passing through the "options"key in each example, for the datasets that have the format_options preprocessor (see here).

This is super helpful. thanks a lot

@nanyyyyyy @gao-xiao-bai So to generate all the templates and options alongside each example you would need to edit the preprocessors used for every task.

One in particular is the formatter (here) which is what applies the pattern (or "template") to each example. You could create a function like this one to store the pattern as a field, and make sure its passed all the way through to the final generated examples by adding to the list of passthrough fields here.

To get the answer options you would do the same thing, passing through the "options"key in each example, for the datasets that have the format_options preprocessor (see here).

Thank you for your response.

@nanyyyyyy @gao-xiao-bai were you guys able to figure this out?

@shayne-longpre I must say it's a little weird not to include the options since FLAN paper evaluations are based on rank-classification with options, so it seems like a key thing to include. The data is appreciated nonetheless.