Francois Dang Ngoc's repositories
scrapy-proxynova
Use scrapy with a list of proxies generated from proxynova.com
scrapy-mongodb-queue
Use scrapy with mongodb to store the request queues (FIFO or LIFO)
scrapy-mongodb-pipeline
MongoDB pipeline for Scrapy. It allows to update existing entries (set new values or add elements to array) when item values are spread over multiple pages
scrapy-simple-http-queue
Scrapy Plugin to use the simple http queue as the queue for the URLs in order to allow distributed crawling
scrapy-redis
Redis-based components for scrapy that allows distributed crawling. Small update to make it work for Scrapy 0.16+ and added QUEUE_TYPE and DUPE_FILTER options
scrapy-source-ip
Simple scrapy downloader implemented using what is described in http://web.archive.org/web/20120316092048/http://dev.scrapy.org/ticket/153
amazon-athena-user-guide
The open source version of the Amazon Athena documentation. To submit feedback & requests for changes, submit issues in this repository, or make proposed changes & submit a pull request.
mongo-hadoop
MongoDB adapter for Hadoop. Small mongo hadoop pig patch to allow to use mongodb fields that starts with an underscore by prefixing them with u_ (e.g., u__id instead of _id).
simple-http-queue
Simple HTTP queue (FIFO and LIFO) implemented using Python, SQLite3 and Tornado. It supports multiple queues.