zzhangpurdue / Gloomy-Bear-FinancialCrawler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Gloomy-Bear-Financial-Crawler

#Description Design a Financical Information Web Crawler. After finishing the project, we can have a better understand about consumer-producer design pattern and the main function of Redis. These is the first step of whole project. In the next stage, we will use NLP method analysis the news and find the relationship between the news and stock price movement.

Main Task(Basic)

1.Design multi-thread web crawler to collect the listed companies' historical news and stock data based on consumer producer desgin pattern.

2.Build a multi-producer, multi-consumer message Queue with Redis

##Option Task(Bonus) 1.Use Hbase manage data

2.Use NLP method analysis the news

3.Depoly crawlers on different computer and replicate data by using MySQL multi-master replication method

#Plan

[2016/02/01 - 2016/02/07] Project Selection, Plan Discussion, and Proposal Draft Writing,System Design, Resource Discovery

[2016/02/08 - 2016/02/15] Project Implementation, Build message Queue

[2016/02/16 - 2016/02/23] Design multi-thread crawler

[2016/02/24 - 2016/02/31] Document Writing and Video Presentation Making

#Language & Framework 1.Python 2.7

2.Redis

3.MySQL/MongoDB

4.Scrapy

#resources AppStore - Crawler

[1]Video of introduction of AppStore https://www.dropbox.com/s/2e9t9kzjs1giop5/20151222AppStore%20Introduction.mov?dl=0

[2]PDF of introduction of AppStore https://www.dropbox.com/s/bja7rfnm42vwtkj/20151222AppStore%20Introduction.pdf?dl=0

[3]Video of crawler https://www.dropbox.com/s/ncgxsqkb1w8sgxr/20151223AppStore%20Crawler.mov?dl=0

[4]PDF for crawler https://www.dropbox.com/s/bja7rfnm42vwtkj/20151222AppStore%20Introduction.pdf?dl=0

[5]Video of crawler homework https://www.dropbox.com/s/49lvwnatbx6bh6v/20160103_AppStore%20CrawlerAdvanced.mov?dl=0

[6]PDF for crawler homework https://www.dropbox.com/s/0ojejis71ebds3s/20160103_Appstore%20CrawlerAdvanced.pdf?dl=0

About


Languages

Language:Python 82.7%Language:Jupyter Notebook 17.3%