m1 / smap

smap is a site-mapping engine written in Go.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

smap

GoDoc Build Status Go Report Card Release Coverage Status

Installation

Use go get to get the latest version

go get github.com/m1/smap

Then import it into your projects using the following:

import (
	"github.com/m1/smap"
)

Usage

smap can be used as a library, for example:

c, _ := client.New(&client.Config{
    MaxWorkers:      50,
    IgnoreRobotsTxt: false,
    UserAgent:       "user-agent 1.1",
})
u, _ := url.Parse("http://example.com")
siteMap, err := c.Crawl(u)
for _, v := range siteMap {
	println(v.URL.path, len(v.Links), len(v.LinkedFrom))
}

CLI usage

smap can also be used on the cli, just install using: go get github.com/m1/gospin/cmd/gospin

To use:

➜  ~ smap --help                    
smap is a site-mapping engine written in Go.

Usage:
  smap [url] [flags]

Flags:
  -h, --help                help for smap
      --json                json output
      --robots              Ignores robots.txt
  -u, --user-agent string   User agent to use for the crawler
  -v, --verbose             verbose printing
  -w, --workers int         How many workers to use (default 50)

For example:

➜  smap go build && ./smap http://google.com --json --verbose --workers=50 --user-agent="test-test" | jq
   {
     "/": {
       "path": "/",
       "redirects_to": null,
       "links": [
         "/advanced_search",
         "/language_tools",
         "/intl/en/ads/",
         "/services/",
         "/intl/en/policies/privacy/",
         "/intl/en/policies/terms/"
       ],
       "linked_from": [
         "/intl/en/ads/",
         "/services/",
         "/advanced_search",
         "/language_tools",
       ],
       "is_redirect": false
     }...