zhangco1079 / ELTeC

Umbrella repository that describes the collections contained in any given release of ELTeC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DOI

ELTeC

General remarks

This is the umbrella repository that references the corpora contained in any given release of ELTeC (European Literary Text Collection), a collection of novels in multiple European languages created by the COST Action Distant Reading for European Literary History (CA16204). Note that this umbrella repository only references the collections but does not contain the actual text files.

ELTeC aims to provide multiple corpora of 100 novels published between 1840 and 1920 in their original language. There are corpora for multiple European languages, but the novels included in them are not translations of each other. Each corpus follows the same criteria for corpus composition, but these criteria allow for some flexibility. The level of compliance with the corpus composition criteria is summarized in the E5C score. The novels are encoded in a common manner according to a specific subset of the Guidelines of the Text Encoding Initiative that is documented in a three-tiered schema. More detailed information on ELTeC can be found in the following places:

The concept DOI, covering all releases of this umbrella repository and always resolving to the latest release, is the following: https://doi.org/10.5281/zenodo.3462435.

Contacts

For issues related to this release or individual collections contained in this release, please use the relevant repository's issue tracker here on Github.

For more general questions, please contact the Working Group lead, Martina Scholger (martina.scholger@uni-graz.at).

Release notes

v1.1.0 (April 11, 2021; DOI: https://doi.org/10.5281/zenodo.4662444)

ELTeC is work in progress and the current release reflects this. This release contains 14 different corpora in total with at least 50 novels each. Collections in this release include: Czech, German, English, French, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovenian, Spanish, Serbian, Swedish and Ukrainian. There are 8 corpora that are complete, containing 100 novels. There are 3 complete corpora that also provide versions with linguistic annotation. See the section "Corpora included" below for details on each collection.

v1.0.0 (November 15, 2020; DOI: https://doi.org/10.5281/zenodo.4274954)

ELTeC is work in progress and the current release reflects this. We aim at 100 novels per language collection, but the current release also includes collections containing less than that, but at least 50 novels. Collections in this release include: German, English, French, Hungarian, Portuguese, Romanian, Slovenian, Spanish, Serbian and Swedish (see details below). Additional collections are in preparation on GitHub and will be included in future releases. Also, there are still improvements to be made, in some cases, to the encoding of the texts, the quality of the transcriptions, or the level of conformance to the sampling criteria.

Generally speaking, the texts included in the current release are encoded either on a level 0 or level 1 encoding, according to the ELTeC scheme of levels of encoding. Please also take note of the README files included with each collection's release linked below. They describe the current state of each collection and provide further information, e.g. on contributors and text sources, and a citation suggestion.

v0.5.0 (November 2019)

ELTeC is work in progress and the current release reflects this. We aim at 100 novels per language collection, but the current release also includes collections containing less than that, but at least 20 novels. Also, there are still improvements to be made, in some cases, to the encoding of the texts, the quality of the transcriptions, or the level of conformance to the sampling criteria.

Generally speaking, the texts included in the current release are encoded either on a level 0 or level 1 encoding, according to the ELTeC scheme of levels of encoding. Please also take note of the README files included with each collection's release linked below. They describe the current state of each collection.

Citation suggestion

If you use any ELTeC collection(s) in your teaching or research, please reference ELTeC in manner consistent with academic best practices. Each collection provides its own citation suggestion, but if you would like to reference the entire ELTeC, please use the following reference:

  • European Literary Text Collection (ELTeC), version 1.1.0, April 2021, edited by Carolin Odebrecht, Lou Burnard and Christof Schöch. COST Action Distant Reading for European Literary History (CA16204). DOI: doi.org/10.5281/zenodo.4662444.
@collection{odebrecht_ELTeC_2021,
  maintitle = {European Literary Text Collection ({ELTeC})},
  editor = {Odebrecht, Carolin and Burnard, Lou and Schöch, Christof},
  version = {v1.1.0},
  year = {2021},
  month = {4},
  publisher = {COST Action Distant Reading for European Literary History},
  url = {https://github.com/COST-ELTeC/ELTeC},
  doi = {10.5281/zenodo.4662444},
  }

In addition, or alternatively, you may cite one of the reference publications about ELTeC:

  • Lou Burnard, Christof Schöch, Carolin Odebrecht (2021): "In Search of Comity: TEI for Distant Reading", in: Journal of the Text Encoding Initiative 14. DOI: https://doi.org/10.4000/jtei.3500.
  • Christof Schöch, Roxana Patraș, Diana Santos, Tomaž Erjavec (2021): "Creating the European Literary Text Collection (ELTeC): Challenges and Perspectives", in: Modern Languages Open 1/25. DOI: http://doi.org/10.3828/mlo.v0i0.364.

Collections included

The following is a list of the language-specific collections of novels included in the current release, in alphabetical order. For each collection, we indicate the version number and date of this particular release of the collection, a brief description of the release content, the collection editors, the URL where it can be found on Github as well as its DOI. In addition, we indicate the concept DOI of all releases of the collection.

v1.1.0 (April 5, 2021, DOI: https://doi.org/10.5281/zenodo.4662444)

Schema repository

About

Umbrella repository that describes the collections contained in any given release of ELTeC