oduwsdl / hypercane

A toolkit for developing algorithms that sample mementos from a web archive collection.

Home Page:https://oduwsdl.github.io/hypercane

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow users to specify their own boilerplate removal method

shawnmjones opened this issue · comments

Hypercane uses ArticleExtractor from boilerpipe because that is what works best with sumgrams. There are some scenarios where this boilerplate removal method produces nothing. Allowing the user to specify their own method, and perhaps an ordered list of preferred methods would go a long way to addressing this issue.