here are some common data cleaning tasks that you can perform using Pandas, along with some code samples:
Removing duplicates:
import pandas as pd
# Load the data into a DataFrame
df = pd.read_csv('data.csv')
# Remove duplicate rows
df.drop_duplicates(inplace=True)
Handling missing values:
import pandas as pd
# Load the data into a DataFrame
df = pd.read_csv('data.csv')
# Replace missing values with a specific value (e.g. 0)
df.fillna(0, inplace=True)
# Drop rows with missing values
df.dropna(inplace=True)
Renaming columns:
import pandas as pd
# Load the data into a DataFrame
df = pd.read_csv('data.csv')
# Rename columns
df.rename(columns={'old_column_name': 'new_column_name'}, inplace=True)
Filtering data based on conditions:
import pandas as pd
# Load the data into a DataFrame
df = pd.read_csv('data.csv')
# Filter rows based on a condition
filtered_df = df[df['column_name'] > 10]
Splitting and combining columns:
import pandas as pd
# Load the data into a DataFrame
df = pd.read_csv('data.csv')
# Split a column into two columns based on a delimiter
df[['first_name', 'last_name']] = df['full_name'].str.split(' ', expand=True)
# Combine two columns into one column
df['full_name'] = df['first_name'] + ' ' + df['last_name']
These are just a few examples of the many data cleaning tasks that you can perform using Pandas. By using these tools, you can efficiently clean and transform your data for analysis and visualization.