data-science pyhton3 webcrawler webscraping-data

Web Scraping

Web scraping consists in gathering data available on websites. This can be done manually by a human user or by a bot. The latter can of course gather data much faster than a human user and that is why we are going to focus on this. Is it therefore technically possible to collect all the data of a website in a matter of minutes this kind of bot.

Prerequisites

python 2.7+
requests
beautifulsoup4

Websites used

References

Webscraping with Python - Ryan Mitchell PDF

Copyright

Copyright Infringement: In most jurisdictions, web scraping is legal, but using copyright data contains certain restrictions. Violation of the Computer Fraud and Abuse Act (CFAA): This law, enacted to prevent computer hackers, prevents fetching data by getting unauthorized access to a page. Trespass to Chattel: Here, a chattel (or data) is violated if the website server is hurt in any way. Thus, trespass to chattel is violated if the server slows or stops because of the scraping.

About

Web Scraping & Crawling - for beginners

data-science pyhton3 webcrawler webscraping-data

Languages

Language:Jupyter Notebook 99.7%Language:Python 0.3%