Considering few Hypothesis based on the observations of data and proving them using Visualizations.
UCI Machine Learning Repository -Link to download Data file: https://archive.ics.uci.edu/ml/machine-learning-databases/00501/ This link leads to Parent Directory and Link to download Zip file where we have data in csv files(multiple files).
-This data set includes hourly air pollutants data from 12 nationally controlled air-quality monitoring sites. The air-quality data are from the Beijing Municipal Environmental Monitoring Center. The meteorological data in each air-quality site is matched with the nearest weather station from the China Meteorological Administration. The time period is from March 1st, 2013 to February 28th, 2017. Missing data are denoted as NA. The attributes are categorized in to three types which are indicated by different symbols in the proposal.
-The zip file consists of data collected from 12 different stations as 12 different csv files. -Each file has 18 columns and 35000 rows and 2.7MB of data. Having different characteristic’s and missing values, there is good scope for Visualization and Data cleaning.
Hypothesis-1: HNull: Increase in gas cocncentration of O3 reduces gas cocncentration of CO,NO2,SO2 HAlt: Increase in gas coccentration of O3 does not reduces gas cocncentration of CO,NO2,SO2
Hypothesis-2: HNull: Increase in TEMP increases DEWP HAlt: Increase in TEMP doesn't increase DEWP
Hypothesis-3: HNull: summers have less amonut of toxic gasses present in atmosphere compared to other seasons HAlt: summers doesn't have less amonut of toxic gasses present in atmosphere compared to other seasons
Hypothesis-4: HNull: Toxic gas concentrations increase over the years HAlt: Toxic gas concentrations doesn't increase over the years