oconnoat / DHBenelux15

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#DH Benelux 2015: A method for cleaning 19th century text with examples from Transactions of the Royal Irish Academy 1800-1899

MONDAY 8 JUNE 2015 15.30 – 16.30 Parallel Paper Sessions C

Link to Abstract

Talk outline:

  1. Introduction to the Journal
    1. History of the RIA
    2. Irish Context
    3. Statistics about the corpus
  2. Getting from the texts to .txts
    1. JSTOR & OCR
    2. Complications: Typographical
    3. Complications: Linguistic
    4. Complications: Layout
  3. Topic Modelling the Journals
    1. Cleanup Process
      1. Stop Words
      2. Spell checking
  4. Results: the Topics and what we learned about the RIA
  5. Results: a Process for cleaning text someone else has OCR'd
  6. Conclusions

About


Languages

Language:JavaScript 98.4%Language:CSS 1.4%Language:HTML 0.2%Language:Python 0.0%