Content Extraction via Text Density (CETD) program provides algorithms to detect and remove the additional content (e.g. ads, navigation menus, copyright notices etc) around the main content of a webpage. see http://disnet.cs.bit.edu.cn/ License For RapidXMl, refer to http://rapidxml.sourceforge.net/ The others are under the GPL version 3, read it at http://www.gnu.org/licenses/gpl.txtLicense Install swig binding for golang $ go get github.com/bluele/cetd