thomas-keller / plos_corpus

parsing the plos corpus dump of fall 2016 (Python + R)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PLoS Parsing

Mix of python (first stage - to fix problems and parse the xml and calculate initial statistics) and R scripts (later statistics & plotting).

Comments and help welcome. This is all in the very early stages, fair amount of articles are missing/failing parsing. Stuff can always be more elaborate.

About

parsing the plos corpus dump of fall 2016 (Python + R)


Languages

Language:Python 100.0%