Content of Adressa News Dataset
deekshakoul opened this issue · comments
Hi All,
The Adressa News Dataset has news content(main body) in Norwegian and not in English, as mentioned in their website as well. How did the authors handle this problem? Did they translate the content of each article or use it as it is?
Hi @deekshakoul . I used the Norwegian content as it is. AS you can see in this script, I have used for Adressa word embeddings pre-trained in Norwegian text (w2v_skipgram_no_lemma_aviskorpus_nowac_nbdigital/model.txt), available at http://vectors.nlpl.eu/repository/