This project entails performing Exploratory Data Analysis (EDA) on a dataset related to sugarcane production. The dataset contains information about various countries' sugarcane production, including production volume, acreage, yield, and production per person, categorized by continent.
- Introduction
- Dataset
- Libraries Used
- Data Cleaning
- Univariate Analysis
- Bivariate Analysis
- Analysis for Continents
- Key Findings
Exploratory Data Analysis is a crucial step in understanding the characteristics and insights hidden within the data. In this project, we analyze the sugarcane production dataset to uncover patterns, trends, and relationships among various features. The analysis aims to provide valuable insights into sugarcane production globally and within different continents.
The dataset contains information about sugarcane production across different countries, including metrics such as production volume, acreage, yield, and production per person. It also categorizes the data by continent, allowing for continent-wise analysis.
- pandas (data manipulation and analysis)
- seaborn (data visualization)
- matplotlib.pyplot (plotting)
Prior to analysis, the dataset underwent cleaning procedures, including handling missing values, data type conversion, and removing unnecessary characters from certain columns to ensure consistency and accuracy in the analysis.
Univariate analysis explores individual variables in the dataset, including distribution, outliers, and summary statistics. Visualizations such as histograms, box plots, and distribution plots are utilized to understand the characteristics of each feature.
Bivariate analysis examines relationships between pairs of variables, investigating correlations and dependencies. Scatter plots, bar plots, and line plots are employed to analyze the interactions between different features and identify any underlying patterns or trends.
This section focuses on analyzing sugarcane production within different continents. It explores continent-wise production, the impact of the number of countries on production, land distribution, and production distribution by continent. Correlation analysis is also conducted to understand relationships between various metrics within each continent.
- Brazil, India, and China contribute significantly to global sugarcane production (around 65%).
- South America leads in overall sugarcane production, followed by Asia and North America.
- The number of sugarcane-producing countries in a continent does not directly determine its total production.