shawcsn / WebExtract

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WebExtract

A fast and accurate algorithm to extract contents from diverse chinese web pages such as the main body of news or blogs pages. This methods used some significant features of useful texts which can locate the main content automatically in chinese pages.
It was written in 2009 as a homework project.

About