nwihardjo / Startup-Crawler

For scraping startup-database website

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crawler

Automatically scrape startup-related data from startup-database website utilising scrapy-splash website

Features

  • Extract startups funding information
  • Scrape startups' information (logo, description, founder) information
  • Adjust the data formatting to be uploaded to MongoDB database

Windows-user dependencies:

  • Anaconda / miniconda - run through the whole framework in the isolated environment
  • Docker
  • Python 3
  • scrapy-splash
  • scrap

Getting the crawled data:

All data is stored in it's responding database website's name in the csv file

Installing guide : here

About

For scraping startup-database website


Languages

Language:Python 100.0%