Minerva

Minerva is an open framework for context-based citation recommendation experiments.

If you want to run an experiment in natural language processing or information retrieval using a corpus of scientific papers you may need to do one, more or all of these things:

Annotate citations in the running text
Parse text from the References/Bibliography section
Find the document for a particular reference inside your corpus
Split sentences in the text (often less trivial than you expect)
Deal with the XML schema of the corpus
Extract position-relevant text from the document:
- Select sentences around a citation token
- The full paragraph containing a reference to a Figure
- The Abstract
- All sentences in which a particular reference is cited

If you are unlucky and need to use a corpus that was not already converted to a machine-readable representation (e.g. XML), you may also need to:

Fetch/download a number of files (normally PDFs)
Convert these PDF files into a structured representation
Clean up the output from this

Minerva aims to make all of this as easy as possible, by providing built-in solutions for many of these tasks and wrappers around existing tools that deal with many other tasks.

Installing Minerva

to be continued...

About

A framework for context-based citation recommendation experiments

GNU General Public License v3.0

Languages

Language:Python 99.6%Language:CSS 0.2%Language:Batchfile 0.1%Language:HTML 0.1%Language:JavaScript 0.0%Language:Shell 0.0%