iPython notebook that determines collocates for documents in a multi-file directory. The complete version is found here: http://mediagestalt.com/thesis/Collocations.html
The data is from the Canadian House of Commons Parliamentary debates, published as Hansard. It can be downloaded as a zip file here: https://dataverse.library.ualberta.ca/dvn/dv/hansard
The data includes transcripts for the years 2006 to 2015 (Parliaments 39-41) inclusive.
This repo can also be viewed in iPython notebook format at: http://nbviewer.ipython.org/github/mediagestalt/Collocations. Download the directory and explore the data in your own way.