councilor-voter-guide

Project Layout Introduce

crawler
各縣市議會的crawler，各crawler名稱與功用如下：

councilors: 現任議員資料
councilors_terms: 歷屆議員資料（不一定包含現任的資料）
bills: 議案資料
meeting_minutes: 議事錄資料（開會出缺席、表決）
data
由上述crawler產出的各縣市原始JSON

util/prettyjson.py: 產出indent好讀版的JSON, README
pretty_format: 放置上述產出的各縣市好讀版JSON
hashlist_meeting_minutes-v141001.json: links map, 存放由meeting_minutes cralwer抓下的binaries detail
candidates_2014.xlsx: 中選會公告的議員候選人
parser
將上述data下的JSON標準化後放入database（如果你只是需要完整的database，可直接跳至Restore DB）

councilors/councilors.py: 處理現任和歷屆議員資料
councilors/candidates.py: 處理候選人資料
bills/bills.py: 處理議案資料
votes/: 出缺席、表決資料，各縣市、各屆分開處理
voter_guide
Web application using Django, Enviroment Setup

In Ubuntu 12.04 LTS

For Crawler (Scrapy 0.24.4)

sudo apt-get install libxml2-dev libxslt1-dev python-dev libffi-dev python-pip
sudo pip install lxml
sudo pip install Scrapy
sudo pip install requests

After install scrapy, you can run commands to test, below using tcc(臺北市議會) for example:

cd crawler/tcc
scrapy crawl bills
scrapy crawl councilors
scrapy crawl councilors_terms
scrapy crawl meeting

If you want to output json file:

cd crawler/tcc
scrapy crawl bills -o bills.json -t json
scrapy crawl councilors -o bills.json -t json
scrapy crawl councilors_terms -o bills.json -t json
scrapy crawl meeting -o bills.json -t json

For Website (Python/Django)

install basic tools

sudo apt-get update
sudo apt-get upgrade
sudo reboot
sudo apt-get install git python-pip python-dev python-setuptools postgresql libpq-dev
sudo easy_install virtualenv

Clone source code from GitHub to local

It is quite big now. please be patient. don't use command like git --depth

git clone https://github.com/g0v/councilor-voter-guide.git       
cd councilor-voter-guide/voter_guide/

Start virtualenv and install packages

(if you don' mind packages installed into your local environment, just pip install -r requirements.txt)

virtualenv --no-site-packages venv      
source venv/bin/activate        
pip install -r requirements.txt

Load Data to your database

We use SQLite as the default database, if you want to use another database, please set your database engine in local_settings.py.

Create Table & restore data

Create Table

python manage.py syncdb --noinput

This step may take some time, be patient.

python manage.py loaddata db.json

Dumpdata

python manage.py dumpdata --exclude auth.permission --exclude contenttypes > db.json

runserver

python manage.py runserver

Now you should able to see the web page at http://localhost:8000

Mac Related Instructions

###Prepare Compiler

There are some python package written in C or C++ such as lxml. so a compiler is required. you can install a compiler via the following command:

xcode-select --install

###Prepare PostgreSQL

You can install the packaged app here. put the app in your Application folder and click it to start.

And please add the following line to your ~/.bash_profile

export PATH=/Applications/Postgres.app/Contents/Versions/9.3/bin/:$PATH

please change the version number 9.3 if you download a different version of PostgreSQL.

after you add the PATH environment variable, source it.

source ~/.bash_profile

if you don't add the PATH variable, installation of psycopg2 will not success.

Web Docker c3h3 / g0v-cvg-web

How to use this images

First Step: Download and Extract pgdata

git clone https://github.com/c3h3/g0v-cvg-pgdata.git && cd g0v-cvg-pgdata && tar xfzv 47821274c242ce68f2d8d18d4bb0d050d6481311.tar.gz

After that, you will get pgdata dir.
Assume pgdata's absolute path is "your_pgdata"

Second Step: RUN postgres with pgdata

docker run --name pgdb -v your_pgdata:/var/lib/postgresql/data postgres:9.3

If you want to use pgadmin connect with your db, you could also forwarding the port out ... with command ...

docker run --name pgdb -p 5432:5432 -v your_pgdata:/var/lib/postgresql/data postgres:9.3

"your_pgdata" is pgdata's absolute path in previous step.

Third Step: RUN web linked with pgdb

docker run --name g0v-cvg-web --link pgdb:postgres -p port_on_host:8000 -d c3h3/g0v-cvg-web

"port_on_host" is the port forwarding out to your host, which you could find your web on http://localhost:port_on_host

Crawler Docker c3h3 / g0v-cvg-crawler

How to use this images

Run Scarpy Server:

docker run --name g0v -p forward_port:6800 -v outside_items:/items -v outside_logs:/logs -d c3h3/g0v-cvg-crawler

"forward_port" is the port you want to forward into docker image (EXPOSE 6800)
"outside_items" is the directory you want to mount into docker image as /items
"outside_logs" is the directory you want to mount into docker image as /logs

Link Scarpy Server for Deploy and Submit Job:

docker run --link g0v:g0v -it c3h3/g0v-cvg-crawler /bin/bash

Example of Deploy ttc:

in a running docker instance which linked with g0v (Scarpy Server), you can use the following command to deploy tcc crawler to server:

cd /tmp/g0v-cvg/crawler/tcc && python deploy.py

Example of Crawl ttc.bills :

in a running docker instance which linked with g0v (Scarpy Server), you can use the following command to deploy tcc crawler to server:

cd /tmp/g0v-cvg/crawler/bin && python crawl_tcc_bills.py

CC0 1.0 Universal

CC0 1.0 Universal
This work is published from Taiwan.

ChihChengLiang / councilor-voter-guide

councilor-voter-guide

Project Layout Introduce

In Ubuntu 12.04 LTS

For Crawler (Scrapy 0.24.4)

For Website (Python/Django)

Clone source code from GitHub to local

Start virtualenv and install packages

Load Data to your database

Create Table & restore data

Dumpdata

runserver

Mac Related Instructions

Web Docker c3h3 / g0v-cvg-web

How to use this images

First Step: Download and Extract pgdata

Second Step: RUN postgres with pgdata

Third Step: RUN web linked with pgdb

Crawler Docker c3h3 / g0v-cvg-crawler

How to use this images

Run Scarpy Server:

Link Scarpy Server for Deploy and Submit Job:

Example of Deploy ttc:

Example of Crawl ttc.bills :

CC0 1.0 Universal

About

Languages