Day | Contents | Remarks |
---|---|---|
day 001 | download file file I/O |
|
day 002 | csv file handling | |
day 003 | xml file handling | |
day 004 | API | POKE |
day 005 | API + JSON | |
day 006 | Headers | |
day 007 | ||
day 008 | Static Webpage Crawling | |
day 009 | download images | |
day 010 | Packages: PyQuery/grab | |
day 011 | Regular Expression | |
day 012 | Ex. ETtoday | |
day 013 | Ex. PTT | |
day 014 | Ex. Yahoo! movie | |
day 015 | Ex. Bank of Taiwan | |
day 016 | Ex. Wiki | recursive scrawling |
day 017 | ||
day 018 | about "headers"... | |
day 019 | Ex. ETtoday | selenium + beautifulsop |
day 020 | API operation | |
day 021 | Ex. ETtoday | Active Web Pages |
day 022 | Ex. Air Quality Website | |
day 023 | Ex. ETtoday.net | Get external website content |
day 024 | Ex. 104 HR | |
day 025 | Scrapy Intro. | no HW |
day 026 | Scrapy: Request | |
day 027 | Scrapy: XPath + Itempipeline | |
day 028 | Scrapy: API | |
day 029 | Scrapy: multi webpage | |
day 030 | some challenges | |
day 031 | headers | |
day 032 | captcha | |
day 033 | login | |
day 034 | proxy IP | |
day 035 | multithread | |
day 036 | asyncronized | |
day 037 | scheduled |