ChienKangLu / t-Distributed-Stochastic-Neighbor-Embedding

Dimensional reduction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

t-Distributed-Stochastic-Neighbor-Embedding

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensional reduction algorithm which comes from Stochastic Neighbor Embedding (SNE). It can capture local and global structure from high dimensional data into low dimensional data

Develop tools and techniques

  • Python
  • Pycharm

SNE

  1. Convert pairwise distances of high dimensional data into conditional probabilities(similaritiy) and assume each datapoint will pick neighbor according to a Gaussain distribution,

  1. Each datapoint of high dimensional data has its own particular variance which can reflect how dense or sparse different region is. A variance can induce a probability distribution . For Selecting proper variance for each i, user can set a fixed perplexity and it will use binary search to find which can let to be a distribution with the fixed perplexity,

  1. Covert low dimensional data into conditional probabilities(similaritiy) with the same way but set the variance to ,

  1. Use gradient discnet to minimize Kullback-Leibler divergence(KL-divergence) of these two distribution,

t-SNE

t-SNE use symmetrized cost function of SNE and use Student-t distribution to compute similarity of low dimensional data.

  1. Symmetrized cost function

  1. Student-t distribution

  1. KL-divergence

  1. Gradient

Practice

  1. Train with momentum: 0.9
  2. Learing rate: 15
  3. Iteration: 500
  4. Data: MNIST
  5. Result
    iter 0 iter 100 iter 200
    iter 300 iter 400 iter 499

Reference

Visualizing Data using t-SNE

About

Dimensional reduction


Languages

Language:Python 100.0%