jaanli / food2vec

:hamburger:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Line breaks for FastText training

RichardSieg opened this issue ยท comments

Hi! Thanks for the nice package ๐Ÿ‘ I'd like to train a model with fasttext myself on your processed data. One thing I noticed from your files is, that the input is separated by line breaks. I was playing around with fasttext myself a bit lately and I found out, that fasttext is ignoring linebreaks for its context window and I guess that is not a behavior you want to have in your case, right? What I did to avoid this, is inserting dummy characters in front and after every line. Maybe this will give you better results.

Also: Is the name of the dish somewhere in your processed data? E.g. I was looking for cheeseburger and couldn't find it..

Thanks! Are you using my fork of fastText? (https://github.com/altosaar/fastText/)

The example is here: https://github.com/altosaar/fastText/blob/master/sentence-context-example.sh

The command starts with ./fasttext sentence_context -- the implementation is here: https://github.com/altosaar/fastText/blob/master/src/fasttext.cc#L163 (it should respect line breaks)

Let me know if that works!