anselmrothe / question_dataset

Human question data set

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Human question data set

Data obtained in Experiment 1 in Rothe, Lake, & Gureckis (2016). Asking and evaluating natural language questions. [PDF]


df.allBySubj.csv contains all 605 questions and some extra info.


| trial| subj|bin                |type       |paras | para_1| para_2| para_3|full                            |text                                  |
|     1|    7|shipsize(red)      |shipsize   |2     |      2|     NA|     NA|How many tiles is the red ship? |How many blocks is the red ship?      |
|     2|    7|horizontal(purple) |horizontal |3     |      3|     NA|     NA|Is the purple ship horizontal?  |Is the purple ship laying horizontal? |
|     3|    7|shipsize(red)      |shipsize   |2     |      2|     NA|     NA|How many tiles is the red ship? |How many blocks is the red ship?      |

Each row refers to one natural language question that a person asked in the context of a partly revealed game board.


  • trial = Context ID = Partly revealed gameboard (see below)
  • subj = Subject ID
  • bin = Program representation of the question
  • type = Program representation -- function
  • paras = Program representation -- parameters
  • paras_1 = Program representation -- parameter 1
  • paras_2 = Program representation -- parameter 2
  • paras_3 = Program representation -- parameter 3
  • full = A standardized example question
  • text = Literal question generated by the subject also available in → questions_clean/

In addition, → questions_raw/ contains all questions that were generated, that is, including those discarded as ambiguous or invalid.


Context 1-18

These partly revealed game boards are in → contexts/, with the following coding:

  • H = Hidden
  • W = Water
  • B = Blue ship
  • R = Red ship
  • P = Purple ship

Example: Trial 13



Human question data set