Tracking changes in word use through comparison of reference and web corpora.
Step 1 extractPlainText.sh
: Extract texts from BNC XML files, with date of publication (where available), text class (written/spoken, academic/discussion/fiction/etc), and document title. Requires XSL files justTheWords.xsl
(BNC distribution) and metadata.xsl
(self-authored).
Andrew Caines, apc38 at cam.ac.uk, November 2017