kanchan88 / webscrap-minio

Proxy based web scraping Project using Python and saving unstructured data captured in Minio.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

webscrap-minio

Simple web scraping Project using Python and proxy and saving unstructured data captured in Minio.

What is Web scraping?

Automated process of gathering useful information from websites. Take an example of Price Comparision Website. You can scrap websites like Amazon, Walmart and Flipkart. Then show people the best price they offer for same product.

What to care ?

Webscraping in normal website is OKAY as they do not block bots. But giants like Amazon or Walmart blocks bot. We have to bypass them using different IPs(proxies), set different timings for accesing the page and use of selenium for real device like browsing.

What is Minio ?

Minio is used for unstructed data storage. It is written in Kubernetes.

What's in this project ?

I have tried to gather some data from urls using proxies and then saving the gathered data using Minio.

About

Proxy based web scraping Project using Python and saving unstructured data captured in Minio.


Languages

Language:Jupyter Notebook 99.2%Language:Python 0.8%