im-dpaul / EDA-Sugarcane-Production

Analyze sugarcane production dataset through Exploratory Data Analysis (EDA). Uncover patterns, trends, and relationships globally and across continents to gain valuable insights into sugarcane production dynamics.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exploratory Data Analysis (EDA) : Sugarcane Production Dataset

This project entails performing Exploratory Data Analysis (EDA) on a dataset related to sugarcane production. The dataset contains information about various countries' sugarcane production, including production volume, acreage, yield, and production per person, categorized by continent.

Table of Contents

Introduction

Exploratory Data Analysis is a crucial step in understanding the characteristics and insights hidden within the data. In this project, we analyze the sugarcane production dataset to uncover patterns, trends, and relationships among various features. The analysis aims to provide valuable insights into sugarcane production globally and within different continents.

Dataset

The dataset contains information about sugarcane production across different countries, including metrics such as production volume, acreage, yield, and production per person. It also categorizes the data by continent, allowing for continent-wise analysis.

Libraries Used

  • pandas (data manipulation and analysis)
  • seaborn (data visualization)
  • matplotlib.pyplot (plotting)

Data Cleaning

Prior to analysis, the dataset underwent cleaning procedures, including handling missing values, data type conversion, and removing unnecessary characters from certain columns to ensure consistency and accuracy in the analysis.

Univariate Analysis

Univariate analysis explores individual variables in the dataset, including distribution, outliers, and summary statistics. Visualizations such as histograms, box plots, and distribution plots are utilized to understand the characteristics of each feature.

Bivariate Analysis

Bivariate analysis examines relationships between pairs of variables, investigating correlations and dependencies. Scatter plots, bar plots, and line plots are employed to analyze the interactions between different features and identify any underlying patterns or trends.

Analysis for Continents

This section focuses on analyzing sugarcane production within different continents. It explores continent-wise production, the impact of the number of countries on production, land distribution, and production distribution by continent. Correlation analysis is also conducted to understand relationships between various metrics within each continent.

Key Findings

  • Brazil, India, and China contribute significantly to global sugarcane production (around 65%).
  • South America leads in overall sugarcane production, followed by Asia and North America.
  • The number of sugarcane-producing countries in a continent does not directly determine its total production.

About

Analyze sugarcane production dataset through Exploratory Data Analysis (EDA). Uncover patterns, trends, and relationships globally and across continents to gain valuable insights into sugarcane production dynamics.


Languages

Language:Jupyter Notebook 100.0%