asirihewage / simplest-xpath-web-scraper

Simplest web scraper created using Python3 and MongoDB

Home Page:https://w3genesis.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Simplest xpath web scraper

Simples web scraper created using Python3

  • extract data using multiple xpaths from multiple urls
  • save response in MongoDB
  • exceptions and error handling
  • only for basic web sraping work from static HTML web pages

setup Data.py for each url with xpath

    {
        "url": "https://www.technology.pitt.edu/blog/zoom10faq",
        "xpaths": [
            {
                "questions": '//div[@class="field-item even"]/h2/text()',
                "answers": '//div[@class="field-item even"]/p/text()',
                "correct_answer": '//div[@class="field-item even"]/p[0]/text()'
            }
        ]
    }

setup mongodb database connection string

myclient = pymongo.MongoClient("mongodb://host:port/") # or add the connection url
mydb = myclient["database"]
mycol = mydb["collection"]

install python dependancies

pip3 install -r requirements.txt

run

python3 main.py

response

Simplest xpath web scraper

Author : Asiri Hewage

About

Simplest web scraper created using Python3 and MongoDB

https://w3genesis.com


Languages

Language:Python 100.0%