WillCaton2350 / Wikipedia-WebCrawler

Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.

Geek Repo

Github PK Tool

json mysql python python-crawler scrapy-crawler scrapy-spider web-crawler wikipedia-crawler

Wiki-WebCrawler-Scrapy

Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from wikipedia's web page using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench. The JSON dataset serves as a potential data source for an API, enhancing data accessibility.

About

json mysql python python-crawler scrapy-crawler scrapy-spider web-crawler wikipedia-crawler

MIT License

Languages

Language:Python 100.0%