hackersandslackers / jsonld-scraper-tutorial

🌎 πŸ–₯ Supercharge your scraper to extract quality page metadata by parsing JSON-LD data via Python's extruct library.

Home Page:https://hackersandslackers.com/scrape-metadata-json-ld/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Structured Data Scraping Tutorial

Python Extruct Requests GitHub Last Commit GitHub Issues GitHub Stars GitHub Forks

Extruct Tutorial

Supercharge your scraper to extract quality page metadata by parsing JSON-LD data via Python's extruct library.

This repository contains source code for the accompanying tutorial on Hackers and Slackers: https://hackersandslackers.com/scrape-metadata-json-ld/

Installation

Installation via requirements.txt:

$ git clone https://github.com/hackersandslackers/jsonld-scraper-tutorial.git
$ cd jsonld-scraper-tutorial
$ python3 -m venv myenv
$ source myenv/bin/activate
$ pip3 install -r requirements.txt
$ python3 main.py

Installation via Pipenv:

$ git clone https://github.com/hackersandslackers/jsonld-scraper-tutorial.git
$ cd jsonld-scraper-tutorial
$ pipenv shell
$ pipenv update
$ python3 main.py

Installation via Poetry:

$ git clone https://github.com/hackersandslackers/jsonld-scraper-tutorial.git
$ cd jsonld-scraper-tutorial
$ poetry shell
$ poetry update
$ poetry run

Usage

To change the URL targeted by this script, update the URL variable in config.py.


Hackers and Slackers tutorials are free of charge. If you found this tutorial helpful, a small donation would be greatly appreciated to keep us in business. All proceeds go towards coffee, and all coffee goes towards more content.

About

🌎 πŸ–₯ Supercharge your scraper to extract quality page metadata by parsing JSON-LD data via Python's extruct library.

https://hackersandslackers.com/scrape-metadata-json-ld/

License:MIT License


Languages

Language:Python 100.0%