Tiny Restful API for parsing h1, h2, h3 and links for given url.

Endpoints

GET 'v1/indexes'

POST 'v1/indexes', url: 'https://www.google.com'

Index Model

default: <Index id: nil, url: "", content: {"h1"=>"", "h2"=>"", "h3"=>"", "links"=>[]}, status: "pending", created_at: nil, updated_at: nil>

indexed: <Index id: 583916550, url: "https://www.wikipedia.com", content: {"h1"=>"Heading 1", "h2"=>"Heading 2", "h3"=>"Heading 3", "links"=>["https://www.google.com", "https://www.wikipedia.com"]}, status: "indexed", created_at: "2017-01-26 06:58:56", updated_at: "2017-01-26 06:58:56">

Flow

Posting indexes url, will create Index record on the database with given url.
Index model will trigger IndexerJob to parse URL's elements, save to content json and update the status.
Indexer Job will parse html with ParserService.

BatuhanW / ruby-scraper

Endpoints

Index Model

Flow

About

Languages