alexnederlof / opengraph-cacher

Simple nodeJS implementation for fetch structured OpenGraph data and cache it in ElasticSearch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

inventid logo

Maintainability Test Coverage Dependency Status

Opengraph Cacher

Serving Opengraph data as a service

What

This project aims to be a simple service for internal use to fetch opengraph data in a structured fashion.

Additionally it caches the results for a configurable time in an Elasticsearch instance.

{
	"_url": "https://www.werkenbijderechtspraak.nl/",
	"_scrapedAt": 1487334683528,
	"_cacheResponse": false,
	"data": {
		"locale": [{
			"value": "nl_NL"
		}],
		"type": [{
			"value": "website"
		}],
		"title": [{
			"value": "Werken bij de Rechtspraak"
		}],
		"description": [{
			"value": "Op zoek naar een baan die er toe doet? De Rechtspraak heeft geregeld vacatures voor nieuwe collega's in juridische, staf of ICT functies"
		}],
		"url": [{
			"value": "https://www.werkenbijderechtspraak.nl/"
		}],
		"site_name": [{
			"value": "Werken bij de Rechtspraak"
		}],
		"twitter_card": [{
			"value": "summary"
		}],
		"twitter_description": [{
			"value": "Op zoek naar een baan die er toe doet? De Rechtspraak heeft geregeld vacatures voor nieuwe collega's in juridische, staf of ICT functies"
		}],
		"twitter_title": [{
			"value": "Werken bij de Rechtspraak"
		}],
		"twitter_site": [{
			"value": "@rechtspraakbaan"
		}],
		"twitter_creator": [{
			"value": "@rechtspraakbaan"
		}],
		"image": [{
			"value": {
				"value": "https://d3pxfuwnql1xse.cloudfront.net/30b9376b5b94713347a6c5c37faf2d1deef6cf59?url=https%3A%2F%2Fwww.werkenbijderechtspraak.nl%2Fwp-content%2Fuploads%2F2016%2F08%2Fheader-3.jpg",
				"width": [{
					"value": "1600"
				}],
				"height": [{
					"value": "220"
				}],
				"type": [{
					"value": null
				}]
			}
		}],
		"twitter_image": [{
			"value": {
				"value": "https://d3pxfuwnql1xse.cloudfront.net/30b9376b5b94713347a6c5c37faf2d1deef6cf59?url=https%3A%2F%2Fwww.werkenbijderechtspraak.nl%2Fwp-content%2Fuploads%2F2016%2F08%2Fheader-3.jpg",
				"width": [{
					"value": null
				}],
				"height": [{
					"value": null
				}],
				"alt": [{
					"value": null
				}]
			}
		}]
	}
}

Elasticsearch

The service will automatically create an index the first time save is performed. There are no special mappings required for the service.

Camo images

In order to ensure clients can requests clients from http over https the camo service can be used. If the environment variables CAMO_HOST and CAMO_KEY are set, images are automatically rewritten to use the defined camo instance.

Docker

A Docker container is available. Configuration is done using some command line variables.

An example is:

docker run \
    -e ES_URL=es.inventid.net:9200 \
    -e ES_INDEX=opengraph \
    -e ES_TYPE=cache \
    -e ES_VERSION=1.7 \
    -e CACHE_IN_DAYS=4 \
    -p 7070:7070 \
    inventid/opengraph-cacher

The CACHE_IN_DAYS variable can be omitted (which will fallback to 28 days).

About

Simple nodeJS implementation for fetch structured OpenGraph data and cache it in ElasticSearch

License:MIT License


Languages

Language:JavaScript 96.3%Language:Dockerfile 3.7%