KBNLresearch / DBNL-canonicity

KB RiR project to Collect a corpus of Dutch novels 1800-2000 and Investigate Canonicity

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Investigating Canonicity with A corpus of Dutch novels 1800-2000

What determines canonicity? Is this purely subjective, or can we partly attribute it to textual features?

Image of books

project aims

  • Collect a corpus of Dutch novels 1800-2000
  • Investigate canonicity with distant reading
  • Release an open access dataset of textual features and metadata
  • Create an online demo
  • World domination

This Researcher-in-Residence project ran from April to October 2021. The results are described in two blog posts:

  1. https://lab.kb.nl/about-us/blog/dataset-dutch-novels-1800-2000
  2. https://lab.kb.nl/about-us/blog/machine-learning-canonicity-dutch-novels-1800-2000

this repository

The code in this repository is mostly intended for documentation purposes. Re-building the dataset and fully reproducing all results requires data from a variety of sources; please get in touch if you are interested in this.

contributors

About

KB RiR project to Collect a corpus of Dutch novels 1800-2000 and Investigate Canonicity

License:GNU General Public License v3.0


Languages

Language:Python 99.5%Language:Makefile 0.5%