JensKrumsieck / bundestag-project

I took a machine learning in python course recently and wanted to practice what i have learnt in this course. This repository contains the progress i made in two days with scraping polls from the website of the german Bundestag.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bundestag Polls

bundestag19

I took a machine learning in python course recently and wanted to practice what i have learnt in this course. This repository contains the progress i made in two days with scraping polls from the website of the german Bundestag.

The poll data was scraped using code in the scraper subfolder and stored in csv files in the data subfolder. The files are named as follows: [Voting Period]_data.csv containing columns for each poll and are named with the following scheme: [Period]-[Session]-[Poll]

Jupyter Notebook index.ipynb contains an attempt to do classfication with supervised machine learning using a Pipeling and GridSearchCV. Best Values for 19th Bundestag (Score: 76.78%):

{'classifier__knn__n_neighbors': 3, 'classifier__pca__n_components': 4}

Pipeline:

vote_cols = [c for c in df.columns if "-" in c]
Pipeline(steps=[('preprocess',
                 ColumnTransformer(sparse_threshold=0,
                                   transformers=[('preprocess_vote',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='Abwesend',
                                                                                 strategy='constant')),
                                                                  ('onehot',
                                                                   OneHotEncoder(handle_unknown='ignore'))]),
                                                  vote_cols)])),
                ('classifier',
                 Pipeline(steps=[('pca', PCA()),
                                 ('knn', KNeighborsClassifier())]))])

Result

The major take-away from this project for me was that you can clearly see in the Visualizations who is the governing coalition in each period and the obligation to vote in accordance with party policy. The classification is not very useful as for yourself to classify a lot of polls have to be taken - although this would be a nice idea for further development.

Visualization is done using tSNE.

19th Bundestag:

18th Bundestag:

17th Bundestag:

20th Bundestag (current):

combined Bundestags (does that even make sense? 😉)

About

I took a machine learning in python course recently and wanted to practice what i have learnt in this course. This repository contains the progress i made in two days with scraping polls from the website of the german Bundestag.


Languages

Language:Python 55.5%Language:Jupyter Notebook 44.5%