MahShaaban / river_bod

Analysis of public data of river organic pollution in South Korea

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

river_bod

Analysis of public data of river organic pollution in South Korea.

Overview

This is a quick Exploratory Data Analysis (EDA) of a public dataset of Biochemical Oxygen Demand (BOD) measurements from 7 spots somewhere in South Korea in the period form 1992 to 2016 :) The aim of this analysis is to find the missing values to assess the reliabiltiy of the measurements, the distribution of the BOD values at different sites and finally the major trends over time.

Datasets

The following datasets used in the analysis are:

  1. river_metadata.csv This is a metadata about the measurements' spots. This dataset consist of 4 columns:

    • river_id which is obviously the river ID
    • river_name this on isn't obvious at all and wouldn't even read out on my computer :(
    • north the 'N' coordinate of the site in the formate (degree.minute.seconds)
    • east the 'E' coordinate of the site in the formate (degree.minute.seconds)
  2. bod.csv This is the measurements (BOD) from the period from 1992 to 2016. This dataset consist of 7 columns and 300 rows. Eache represnet a single BOD measurement each month for 25 years at a particular site.

  3. score.csv This is some score and a category - that I don't understand :P

    • river_id the same river IDs mentioned above
    • score some number!
    • category a category based on the number much like an elementary school grade category (excellent, good, fair)

EDA

Missing values

  1. Proportion of missing data Figure 1
  2. Total percent of missing data Figure 2

Distributions

  1. Distribution of BOD measurements Figure 3
  2. Distribution of log BOD measurements Figure 4
  3. Distribution of BOD per river Figure 5
  4. Distribution of log BOD per river Figure 6
  5. Distribution of BOD per river over time Figure 7

Trends

  1. Average BOD over time Figure 8
  2. Average BOD over time (LOESS smoothed) Figure 9
  3. Contribution of rivers to the total BOD Figure 10
  4. Average BOD per month Figure 11
  5. Average BOD per month (LOESS smoothed) Figure 12

Maps

  1. Scores and categories map Figure 13

Conclusions

It's too late/early to make any conclustions!

About

Analysis of public data of river organic pollution in South Korea


Languages

Language:HTML 98.0%Language:R 2.0%