t-Distributed-Stochastic-Neighbor-Embedding
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensional reduction algorithm which comes from Stochastic Neighbor Embedding (SNE). It can capture local and global structure from high dimensional data into low dimensional data
Develop tools and techniques
- Python
- Pycharm
SNE
- Convert pairwise distances of high dimensional data into conditional probabilities(similaritiy) and assume each datapoint will pick neighbor according to a Gaussain distribution,
- Each datapoint of high dimensional data has its own particular variance which can reflect how dense or sparse different region is. A variance can induce a probability distribution . For Selecting proper variance for each i, user can set a fixed perplexity and it will use binary search to find which can let to be a distribution with the fixed perplexity,
- Covert low dimensional data into conditional probabilities(similaritiy) with the same way but set the variance to ,
- Use gradient discnet to minimize Kullback-Leibler divergence(KL-divergence) of these two distribution,
t-SNE
t-SNE use symmetrized cost function of SNE and use Student-t distribution to compute similarity of low dimensional data.
- Symmetrized cost function
- Student-t distribution
- KL-divergence
- Gradient
Practice
- Train with momentum: 0.9
- Learing rate: 15
- Iteration: 500
- Data: MNIST
- Result
iter 0 iter 100 iter 200 iter 300 iter 400 iter 499