yasszu / finagle-web-crawler

Web Crawler of Google Developers Blog with Finagle

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Web Crawler with Finagle

Scrape below blogs:

Getting Started

CREATE DATABASE IF NOT EXISTS `crawler` 
DEFAULT CHARACTER SET = utf8mb4
COLLATE utf8mb4_unicode_ci;

Run application

$ sbt 'run-main app.Server -db.host localhost'

Run with Docker

$ sbt docker:publishLocal
$ docker-compose build
$ docker-compose up

MySQL container shell

$ docker exec -it finagle-web-crawler_db_1 bash

Refs

Deploy fat JAR

  • Create a JAR file
$ sbt assembly

  • Run process
$ java -jar target/scala-2.12/finagle-web-crawler-assembly-1.0-SNAPSHOT.jar -db.host='localhost'

Feed

GET feed/googleblog/developers

  • Example
$ curl -X GET 'http://localhost:8080/feed/googleblog/developers'

GET feed/googleblog/developers_jp

  • Example
$ curl -X GET 'http://localhost:8080/feed/googleblog/developers_jp'

API

GET api/googleblog/developers

  • Example
$ curl -X GET 'http://localhost:8080/api/googleblog/developers?count=5&page=0'

GET api/googleblog/developers_jp

  • Example
$ curl -X GET 'http://localhost:8080/api/googleblog/developers_jp?count=5&page=0'

GET api/developers/android

  • Example
$ curl -X GET 'http://localhost:8080/api/developers/android?count=5&page=0'

Execute scraping by manual

GET scrape/googleblog/developers

  • Example
$ curl -X GET 'http://localhost:8080/scrape/googleblog/developers'

GET scrape/googleblog/developers_jp

  • Example
$ curl -X GET 'http://localhost:8080/scrape/googleblog/developers_jp'

GET scrape/googleblog/android

  • Example
$ curl -X GET 'http://localhost:8080/scrape/googleblog/android'

About

Web Crawler of Google Developers Blog with Finagle

License:Apache License 2.0


Languages

Language:Scala 100.0%