RomanNess / fundgrube-crawler

Quick and dirty Crawler to find deals in Fundgrube by MediaMarkt and Saturn because their shop systems are so horrible.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fundgrube-crawler [WIP]

Quick and dirty Crawler to find deals in Fundgrube by MediaMarkt and Saturn because their shop systems are so horrible.

Implemented in Golang and using MongoDB as persistence since I usually don't use these. :)

Use Case

  • Cross compiled and deployed on a raspberry pi.
  • MongoDB Atlas free tier used for persistence.
  • Is currently started as a script that's configured with env vars. Will fetch all Postings in a Category and search for matching Deals.
  • Alerts via email when new Deals are found.

Environmental Variables

name desc default
MONGODB_URI - mongodb://localhost:27017
MONGODB_USERNAME - root
MONGODB_PASSWORD - example
MONGODB_DB - fundgrube
MONGODB_COLLECTION_POSTINGS - postings
MONGODB_COLLECTION_OPERATIONS - operations
FIND_ALL ignore last run and search in all postings false
LIMIT_OUTLETS only fetch 5 first outlets (for development) false
LOG_TO_FILE log to /tmp/fundgrube.txt instead of stdout false
MOCKED_POSTINGS mock response from api false
SKIP_CRAWLING skip fetching postings from api false
FAST_CRAWLING stop crawling api when no new postings on current page false
LOG_LEVEL levels: trace, debug, info, warn, error, fatal, panic info

API peculiarities

  • There is only a /api/postings endpoint known to me, but it also returns a list of outlets and brands in the response.
  • Requests with a limit >= 100 always return the first page.
  • Requests with an offset > 990 return 422 Unprocessable Entity, so you need to iterate over brands or outlets to see all postings.
  • I assume that postings are sorted by descending creation date in the API. So it's possible to implement a "fast refresh" by cancelling the update when a page contains no new postings.

Shell script

#1 inspired me to quickly hack my initial idea that solves the same use case with a bash script. Simply run SEARCH_REGEX="sony walkman" ./_sh/fundgrube-crawler.sh to search for matching postings. The script creates a TSV file with previous results in /tmp and will therefore only discover new postings.

MongoDB migrations

cmd/fundgrube-migrate/main.go contains poor man's db migrations and a tooling to clean up obsolete db entries.

  • Provide a filterString and updateString to perform a migration on the postings collection with MIGRATE=true make migrate.
  • Provide a filterString to delete entries from the postings collection with CLEANUP=true make migrate.

If the env var is not provided a dry run with the filterString is performed in both cases.

About

Quick and dirty Crawler to find deals in Fundgrube by MediaMarkt and Saturn because their shop systems are so horrible.

License:GNU General Public License v3.0


Languages

Language:Go 94.7%Language:Makefile 3.1%Language:Shell 2.1%