Swap-Nova / Pandas_Data-Analysis

Getting started with Pandas and understanding how to build series as well as dataframes. Moreover importing an dataset and using pandas to view the data and manipulate the data according to the algorithm needs.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting Started with Pandas:

  • Pandas are used to explore data, analyze data, and manipulate data that is used in the ML field. Sometimes we have to make our raw data and manipulate it through pandas to make it in a form that ML algorithms can understand.
  • There are two data types in Pandas:
    1. Series: It is a dimensional data type where we define a single object in an array manner.
series = pd.Series(["BMW", "Honda", "Audi"])
# To print the output
series
  1. Dataframe: A two-dimensional data type that takes a Python dictionary. Moreover, it can take the data from a series as well.
  • To get started
car_data = pd.dataframe({"Car Make": series, "Color": colors})
# to print the output
car_data
Screenshot 2023-09-22 at 12 06 03 AM

Importing Data through URLs:

  • Make sure that make sure the dataset is in the "raw" format, by clicking the raw button on GitHub.
heart_disease = pd.read_csv("https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/data/heart-disease.csv")

Describing Data with Pandas

Attributes Functions
car_sales.dtypes Meta information which is stored in car sales data frame
car_sales.to_csv() Series of steps performed to execute the cmd

Difference between .loc and .iloc:

  • .loc (location): We can manually define the location of the object inside the array and then call the object by mentioning the location assigned to it. This refers to the index.
# .loc : Location
animals = pd.Series(["cat", "dog", "panda", "owl"], index=[0, 3, 9, 3])
animals.loc[3]

# OUTPUT:
3    dog
3    owl
dtype: object
  • .iloc (integer location): In the above code we have defined the animal data series and when we call it using iloc, it will give the array object of that location. This refers to position.
# .iloc refers to position 
animals.iloc[3]

# OUTPUT:
'owl'

Replacing String to Int:

price_plot = car_sales["Price"].replace('[\$\,\.]', '', regex=True).astype(int)
  • Regex is a sequence of characters that defines a search pattern. In Python, regex is implemented in the re-module. Regex patterns can be used to match, search, replace, or extract specific text from a string.

Using Matplotlib:

Screenshot 2023-09-22 at 12 15 07 AM

About

Getting started with Pandas and understanding how to build series as well as dataframes. Moreover importing an dataset and using pandas to view the data and manipulate the data according to the algorithm needs.


Languages

Language:Jupyter Notebook 100.0%