bmkessler / autohl

Text extraction based auto-highlighting

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

autohl

This project is attempting to create an auto-highlighter via text extraction.

Currently, autohl.py will process a text file and attempt to split it into sentences. The sentences are compared via the overlap of stemmed words and then ranked via the PageRank algorithm (TextRank: Rada Mihalcea and Paul Tarau, 2004: TextRank: Bringing Order into Texts, Department of Computer Science University of North Texas). The output is written to an html file with a slider to display the level of highlighting corresponding to percentage of sentences highlighted.

The next iteration of this project will be a port to pure JavaScript to allow a Chrome extension.

About

Text extraction based auto-highlighting

License:GNU General Public License v2.0


Languages

Language:Python 100.0%