kimihito / go-readability

Go package that cleans a HTML page for better readability.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Go-Readability

GoDoc Travis CI Go Report Card

Go-Readability is a Go package that cleans a HTML page from clutter like buttons, ads and background images, and changes the page's text size, contrast and layout for better readability.

This package is fork from readability by ying32, which inspired by readability for node.js and readability for python. I also add some function from the readibility by Mozilla.

Why fork ?

There are severals reasons as to why I create a new fork instead sending a PR to original repository :

  • It seems GitHub is hard to access from China, that's why ying32 is not really active on his repository.
  • Most of comment and documentation in original repository is in Chinese language, which unfortunately I still not able to understand.

Example

package main

import (
	"fmt"
	nurl "net/url"
	"time"

	"github.com/RadhiFadlillah/go-readability"
)

func main() {
	// Create URL
	url := "https://www.nytimes.com/2018/01/21/technology/inside-amazon-go-a-store-of-the-future.html"
	parsedURL, _ := nurl.Parse(url)

	// Fetch readable content
	article, err := readability.FromURL(parsedURL, 5*time.Second)
	if err != nil {
		panic(err)
	}

	// Show results
	fmt.Println(article.Meta.Title)
	fmt.Println(article.Meta.Excerpt)
	fmt.Println(article.Meta.Author)
	fmt.Println(article.Content)
}

About

Go package that cleans a HTML page for better readability.

License:MIT License


Languages

Language:HTML 63.0%Language:Go 37.0%