The New York Times articles server

The service scrapes all news headlines from nytimes.com and expose them using the GraphQL API.
The articles can be queried using following GraphQL schema:

type News {
  title: String,
  link: String,
}

type Query {
  news: [News!]!
}

Once service is started it will start scraping headlines and will redirect user to GraphQL Playground page.

Configuration

The service can customised by changing following settings

Setting	Description	Default value
SERVER_HTTP_PORT	Server port	8080
DB_NAME	Database name	wardrobe
DB_HOST	Database server host	localhost
DB_PORT	Database server port	5432
DB_USER	Database user	user
DB_PASSWORD	Database user password	1234
DB_WHETHER_CREATE_SCHEMA	Whether to create a database schema on the system run?	true
NY_TIMES_URL	The URL of the New York Times website	https://www.nytimes.com/
SCRAPE_REPEAT_INTERVAL	Scrape repeating interval	every 4 hours

How to build?

Clone project
Build the project
Run tests

sbt compile
sbt test

Technology stack

scala 2.13.6 as the main application programming language
http4s typeful, functional, streaming HTTP for Scala
sangria a GraphQL implementation for Scala
scala-scraper a Scala library for scraping content from HTML pages
quill compile-time language integrated queries for Scala
cats to write more functional and less boilerplate code
cats-effect The Haskell IO monad for Scala
pureconfig for loading configuration files
refined for type constraints avoiding unnecessary testing and boilerplate
circe a JSON library for Scala
scalatest and ScalaCheck for unit and property based testing
testcontainers to run system dependant services for Integration Testing purposes

About

The New York Times articles scraper with GraphQL interface

crawler graphql http4s nytimes quill

Languages

Language:Scala 100.0%