I checked their archive page, and it seems that the daily archive page urls are generated using day, month, year and another certain delta component, which can also be calculated (got the calculation from their JS script). So I'm just generating all the daily archive pages and then crawling them for articles.
- clone the repo
- create virtualenv and install the requirements.txt
- run
scrapy crawl article -o outputfile.json