ccywch / crawlStackExchangeDataExplorer

Automatically execute sql in Stack Exchange Data Explorer and download the file

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

crawlStackExchangeDataExplorer

Stack Exchange Data Explorer is a platform for users to query data in Stack Exchange sites. This is a script written in Python which can crawl the viewCount of questions in Stack Overflow. As there are 10 million questions in Stack Overflow recently and only 50 thousand results can be returned in Stack Exchange Data exploer, we need to write a program to repeate such steps and get the data automatically.

##Functions This script can

  • automatically execute the SQL in that platform and download the result file;
  • remove the file from default download path to where we want;
  • aggregate all data into one file.

##Prerequisites

  • Install Python 2.7.* ;
  • Install selenium for python ;
  • Download Chrome driver and put it into the same path with viewCount.py.

##Others It sucks sometimes and maybe it will be more stable when using phantomjs. Of course, more excelent solution to this task is to analyze the request and response in the site and then get the data in that way. But you may need to consider the captcha image which is rather difficult to recognize.

Of course, you are free to improve this one and enjoy the journey.

About

Automatically execute sql in Stack Exchange Data Explorer and download the file


Languages

Language:Python 100.0%