python3 webscrping scraper web data data-mining

Simplest xpath web scraper

Simples web scraper created using Python3

extract data using multiple xpaths from multiple urls
save response in MongoDB
exceptions and error handling
only for basic web sraping work from static HTML web pages

setup Data.py for each url with xpath

    {
        "url": "https://www.technology.pitt.edu/blog/zoom10faq",
        "xpaths": [
            {
                "questions": '//div[@class="field-item even"]/h2/text()',
                "answers": '//div[@class="field-item even"]/p/text()',
                "correct_answer": '//div[@class="field-item even"]/p[0]/text()'
            }
        ]
    }

setup mongodb database connection string

myclient = pymongo.MongoClient("mongodb://host:port/") # or add the connection url
mydb = myclient["database"]
mycol = mydb["collection"]

install python dependancies

pip3 install -r requirements.txt

run

python3 main.py

response

Author : Asiri Hewage

About

Simplest web scraper created using Python3 and MongoDB

https://w3genesis.com

python3 webscrping scraper web data data-mining

Languages

Language:Python 100.0%