Use go get to get the latest version
go get github.com/m1/smap
Then import it into your projects using the following:
import (
"github.com/m1/smap"
)
smap can be used as a library, for example:
c, _ := client.New(&client.Config{
MaxWorkers: 50,
IgnoreRobotsTxt: false,
UserAgent: "user-agent 1.1",
})
u, _ := url.Parse("http://example.com")
siteMap, err := c.Crawl(u)
for _, v := range siteMap {
println(v.URL.path, len(v.Links), len(v.LinkedFrom))
}
smap can also be used on the cli, just install using: go get github.com/m1/gospin/cmd/gospin
To use:
➜ ~ smap --help
smap is a site-mapping engine written in Go.
Usage:
smap [url] [flags]
Flags:
-h, --help help for smap
--json json output
--robots Ignores robots.txt
-u, --user-agent string User agent to use for the crawler
-v, --verbose verbose printing
-w, --workers int How many workers to use (default 50)
For example:
➜ smap go build && ./smap http://google.com --json --verbose --workers=50 --user-agent="test-test" | jq
{
"/": {
"path": "/",
"redirects_to": null,
"links": [
"/advanced_search",
"/language_tools",
"/intl/en/ads/",
"/services/",
"/intl/en/policies/privacy/",
"/intl/en/policies/terms/"
],
"linked_from": [
"/intl/en/ads/",
"/services/",
"/advanced_search",
"/language_tools",
],
"is_redirect": false
}...