aneksamun / nytimes-articles-server

The New York Times articles scraper with GraphQL interface

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The New York Times articles server

Build status

The service scrapes all news headlines from nytimes.com and expose them using the GraphQL API.
The articles can be queried using following GraphQL schema:

type News {
  title: String,
  link: String,
}

type Query {
  news: [News!]!
}

Once service is started it will start scraping headlines and will redirect user to GraphQL Playground page.

playground

Configuration

The service can customised by changing following settings

Setting Description Default value
SERVER_HTTP_PORT Server port 8080
DB_NAME Database name wardrobe
DB_HOST Database server host localhost
DB_PORT Database server port 5432
DB_USER Database user user
DB_PASSWORD Database user password 1234
DB_WHETHER_CREATE_SCHEMA Whether to create a database schema on the system run? true
NY_TIMES_URL The URL of the New York Times website https://www.nytimes.com/
SCRAPE_REPEAT_INTERVAL Scrape repeating interval every 4 hours

How to build?

  • Clone project
  • Build the project
  • Run tests
sbt compile
sbt test

Technology stack

  • scala 2.13.6 as the main application programming language
  • http4s typeful, functional, streaming HTTP for Scala
  • sangria a GraphQL implementation for Scala
  • scala-scraper a Scala library for scraping content from HTML pages
  • quill compile-time language integrated queries for Scala
  • cats to write more functional and less boilerplate code
  • cats-effect The Haskell IO monad for Scala
  • pureconfig for loading configuration files
  • refined for type constraints avoiding unnecessary testing and boilerplate
  • circe a JSON library for Scala
  • scalatest and ScalaCheck for unit and property based testing
  • testcontainers to run system dependant services for Integration Testing purposes

About

The New York Times articles scraper with GraphQL interface


Languages

Language:Scala 100.0%