There are 2 repositories under extract-website topic.
A fork of https://bitbucket.org/fivefilters/php-readability
A PHP library to extract article text from web pages
Converts website to json using jQuery selectors
Free PHP library to extract the main content from an article post or news post, including images and HTML