shahriar-rahman / Unsupervised-Learning-Principal-Component-Analysis

PCA for unsupervised Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unsupervised Learning Principal Component Analysis

Principal component analysis, or PCA, is a statistical procedure that allows the summarization of the information content in large data tables by means of a smaller set of “summary indices” that can be more easily visualized and analyzed. is a widely covered machine learning method on the web, and can be broken down into five steps.

  1. Standardize the range of continuous initial variables
  2. Compute the covariance matrix to identify correlations
  3. Compute the eigenvectors and eigenvalues of the covariance matrix to identify the principal components
  4. Create a feature vector to decide which principal components to keep
  5. Recast the data along the axes of the principal component

alt text


Introduction

Principal component analysis, or PCA, is a dimensionality reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Reducing the number of variables of a data set naturally comes at the expense of accuracy, but the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because smaller data sets are easier to explore and visualize and make analyzing data points much easier and faster for machine learning algorithms without extraneous variables to process.

So, to sum up, the idea of PCA is simple — reduce the number of variables of a data set, while preserving as much information as possible.


alt text

About

PCA for unsupervised Learning


Languages

Language:Jupyter Notebook 100.0%