adonoho / HCAuthorship

Supplementary material for the paper ``Higher Criticism for Discriminating Word-Frequency Tables and Testing Authorship''

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HCAuthorship

This repository contains code and dataset or dataset description used to obtain the results reported in

Content:

  • AuthAttLib -- library to facilitate the use of HC-based similarity measure in authorship attribution challenges. See project https://github.com/alonkipnis/AuthorshipAttribution for more details.
  • AuthorshipChallenge -- contains data and code (IPython notebook) for using HC-based similarity in the ``PAN 2018 Cross-domain authorship attribution'' challenge.
  • Federalists -- data and code (IPython notebook) for using HC to attribute authorship in the Federalist papers
  • Gutenberg -- code for attributing authorship using HC on a collection of more than 11,000 titles from the Gutenberg project. Also included is the list of titles and authors in this collection, and the file containing the result of the attribution procedure.
  • var_analysis -- code (R notebook) and data for conducting an anlysis of the variation of words within corpus and the degree by which the affect the HC calculation.

About

Supplementary material for the paper ``Higher Criticism for Discriminating Word-Frequency Tables and Testing Authorship''


Languages

Language:Jupyter Notebook 99.5%Language:Python 0.4%Language:R 0.1%Language:Shell 0.0%