Dataset layout?

Question

Dataset layout?

hughperkins opened this issue 7 years ago · comments

Some questions on data:

Is it a fair impression that each of the reviewsx... files, for beer reviews, is laid out as follows?

[look] [smell] [feel] [taste] [overall]        [input words ...]

? (I used the values for the 'deep brown color with a thin tan head that quickly dissipated' review, to obtain this sequence, by comparison with the page at https://www.beeradvocate.com/beer/profile/144/30806/?ba=Will_Turner , and the numbers in the dataset)

why are the datasets broken down into 'aspect1', 'aspect2', etc?
- Is it a fair impression that each of these is the results of decorrelation, section 5.1, 'Dataset', for that specific aspect?
- Can I assume that aspect1 is the first aspect, as laid out inside the files, ie [look]?
- is this also true for 2 and 3, ie:
  - aspect2 is [smell]?, and
  - aspect3 is [feel]?
which wordvectors are you using? It looks like you are using something 200-dimensional? Maybe glove 200, from https://nlp.stanford.edu/projects/glove/, ie http://nlp.stanford.edu/data/glove.6B.zip ?

Hugh Perkins · Answer 1 · Sun Aug 13 2017 17:15:16 GMT+0800 (China Standard Time)

Edit, oh right, and, annotations.json, is this kind of like 'ground truth' for which bits of text should ideally be used for each aspect? Dont need this for training/dev-validation? Just used for the 'precision' bit of table 2, is this a fair impression?

Edit2, ok the annotations.json presumably corresponds to this bit? :