computationalstylistics / 100_english_novels

A benchmark corpus of 100 English novels, covering the 19th and the beginning of the 20th century

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

100 English Novels ver. 1.4

A benchmark corpus of 100 English novels, covering the 19th and the beginning of the 20th century. It contains novels by 33 authors (1/3 female writers, 2/3 male writers), and one anonymous (well, not so much...) novel entitled "Clara Vaughan".

The corpus is aimed at stylometric benchmarks. See: https://computationalstylistics.github.io/resources/ for further details.

Additionally, the folder 'word_embedding_models' contains two vector representations of the benchmark novels. The two models were produced using the GloVe algorithm via the 'text2vec' library for R. The models include a 50-dimensional representation of words, as well as a 100-dimensional one.

About

A benchmark corpus of 100 English novels, covering the 19th and the beginning of the 20th century