taolei87 / rcnn

Recurrent & convolutional neural network modules

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dataset layout?

hughperkins opened this issue · comments

Some questions on data:

  • Is it a fair impression that each of the reviewsx... files, for beer reviews, is laid out as follows?
[look] [smell] [feel] [taste] [overall]        [input words ...]

? (I used the values for the 'deep brown color with a thin tan head that quickly dissipated' review, to obtain this sequence, by comparison with the page at https://www.beeradvocate.com/beer/profile/144/30806/?ba=Will_Turner , and the numbers in the dataset)

  • why are the datasets broken down into 'aspect1', 'aspect2', etc?
    • Is it a fair impression that each of these is the results of decorrelation, section 5.1, 'Dataset', for that specific aspect?
    • Can I assume that aspect1 is the first aspect, as laid out inside the files, ie [look]?
    • is this also true for 2 and 3, ie:
      • aspect2 is [smell]?, and
      • aspect3 is [feel]?
  • which wordvectors are you using? It looks like you are using something 200-dimensional? Maybe glove 200, from https://nlp.stanford.edu/projects/glove/, ie http://nlp.stanford.edu/data/glove.6B.zip ?

Edit, oh right, and, annotations.json, is this kind of like 'ground truth' for which bits of text should ideally be used for each aspect? Dont need this for training/dev-validation? Just used for the 'precision' bit of table 2, is this a fair impression?

Edit2, ok the annotations.json presumably corresponds to this bit? :

screen shot 2017-08-13 at 6 03 44 pm

screen shot 2017-08-13 at 6 03 54 pm