#DH Benelux 2015: A method for cleaning 19th century text with examples from Transactions of the Royal Irish Academy 1800-1899
MONDAY 8 JUNE 2015 15.30 – 16.30 Parallel Paper Sessions C
- Introduction to the Journal
- History of the RIA
- Irish Context
- Statistics about the corpus
- Getting from the texts to .txts
- JSTOR & OCR
- Complications: Typographical
- Complications: Linguistic
- Complications: Layout
- Topic Modelling the Journals
- Cleanup Process
- Stop Words
- Spell checking
- Cleanup Process
- Results: the Topics and what we learned about the RIA
- Results: a Process for cleaning text someone else has OCR'd
- Conclusions