benbogin / unobserved-local-structures

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code and data for the paper Unobserved Local Structures Make Compositional Generalization Hard.

COVR-10

COVR is a synthetic semantic parsing dataset used to evaluate sequence to sequence models for compositional generalization. COVR-10 contains 10 compositional splits, in which each test set contains a particular kind of unseen programs.

The splits

# Acc.1 (FT)
Bart/T5
Acc.2 (ICL)
GPT-3
2-ULSs
(unobserved local structures3)
Example
8 0.34 0.51 eq+triangle
eq+brown
eq+gray
eq+round
eq+query_attr[color]
eq+black
eq+white
eq+query_attr[shape]
eq+square
Both the color of cat that is chasing black triangle mouse that is playing with ...
and (🟠eq (🔵query_attr [color] (with_relation (find (cat), chasing, with_relation (...
25 0.59 0.23 and+some
none+filter
filter+scene
some+filter
most+filter
exists+filter
all+filter
None of square square cat are playing with dog that is looking at white animal...
🟠none (🔵filter (square, filter (square, find (cat))), with_relation (scene (), pla...
34 0.35 0.38 all+with_relation
with_relation+scene
exists+with_relation
none+with_relation
most+with_relation
some+with_relation
Either the number of white animal that is looking at square brown animal that is...
or (eq (count (🔵with_relation (filter (white, find (animal)), looking at, ...), 4...
43 0.2 0.11 and+some
and+most
or+all
and+all
or+none
and+none
or+most
or+some
Both the color of cat is equal to brown and some of cat are brown ...
🟠and (eq (query_attr [color] (find (cat)), brown), 🔵some (find (cat), filter (brow...
48 0 0.85 <s>+query_attr[shape]
<s>+query_attr[color]
What is the shape of square cat that is looking at black brown animal that is lo...
🟤query_attr [shape] (with_relation (filter (square, find (cat)), looking at, with...
51 0.64 0.35 Either the color of mouse that is playing with mouse that is chasing triangle br...
or (eq (query_attr [color] (with_relation (find (mouse), playing with, with_rela...
99 0 0.89 <s>+count
What is the number of gray animal that is chasing gray mouse that is playing wit...
🟤count (with_relation (filter (gray, find (animal)), chasing, with_relation (filt...
100 0.02 0.18 and+exists
exists+find
or+exists
Both the shape of cat is equal to white and there is triangle black cat ...
🟠and (eq (query_attr [shape] (find (cat)), white), 🔵exists (filter (triangle, filt...
110 0.18 0.33 with_relation+filter
Either the number of animal is equal to the number of round dog that is chasing ...
or (eq (count (find (animal)), count (🟠with_relation (🔵filter (round, find (dog)),...
115 0.28 0.05 all+with_relation
with_relation+scene
none+with_relation
most+with_relation
some+with_relation
Either all of cat that is chasing triangle triangle cat that is playing with mou...
or (🟠all (🔵with_relation (find (cat), chasing, with_relation (filter (triangle, fi...
More

🟠 and 🔵 represent an unseen pair of symbols in a given example. 🟤 represents a symbol that was unseen as a first token in the output sequence.

Splits are created using the Synchronous context-free grammar (SCFG) rules that have generated this dataset, by holding out sets of rules that are not seen together during training.

  • For details on this splitting method, see our paper (Appendix B.2).
  • You can see the set of unseen grammar rules for each split, along with training and test examples, by clicking on Details for any desired split.
  • See the list of all grammar splits, which includes splits that were not selected for COVR-10. This list only includes grammar splits and not n-LS splits.
  • Download COVR-10

1Average exact match accuracy for BART-Base, BART-Large, T5-Base and T5-Large, fine-tuned (FT) separately on each split (see implementation details in the paper).

2Exact match accuracy of GPT-3, engine text-davinci-002, using OpenAI API. For each split we evaluated on a subset of 100 test examples. We use in-context learning (ICL): for each test instance, we randomly sample 10 examples from the training set and add their source and target to the prompt. Click on the GPT-3 accuracy to see samples of prompts and outputs.

3Unobserved local structures of size 2 (2-LS), considering only parent-child relations.

Download datasets and splits used in paper

Dataset Split Method # Splits Download Dataset and splits Comments
COVR-10 Grammar 10 covr10.zip
COVR Grammar/
n-LS
124/
22
covr.zip
Overnight Template 5 (per domain) overnight.zip
Schema2QA Template 5 s2q.zip Both utterances and targets are normalized for better evaluation, and are anonymized to resolve column ambiguity
Atis Template 5 atis.zip Normalized variables for better evaluation

Experiments

Code to run experiments.

Code to compute easiness.

About


Languages

Language:Python 98.4%Language:Jsonnet 1.6%