balansky / Tsne

T-Distributed Stochastic Neighbor Embedding Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fast TSNE

This is a fork of Multicore t-SNE cython wrapper. This code has similar speed with Multicore t-SNE. In addition, support partial fitting function to continuously add points into tsne map.

How to use

Cython wrappers are available.

Python

Pre-Requirements

  • Python3
  • Numpy (>=1.18.0)
  • Cython (>=0.28.2)
  • cysignals
  • cmake
  • OpenMP 2(slow if not install)

Install

pip install numpy cython cysignals
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=RELEASE --DPYTHON_EXECUTABLE=$(which python) ..
make
make install

UnInstall

pip uninstall PyFastTsne

Tested with 3.6 (conda) and Ubuntu 16.04.

Run

You can use it as a near drop-in replacement for sklearn.manifold.TSNE.

from PyFastTsne import PyTsne

x_dim = 728
y_dim = 2
tsne = PyTsne(x_dim, y_dim)
Y = tsne.fit_transform(X)

## continuously add extra points 
tsne = tsne.partial_fit(extra_X, ret_Y, n_iter=300)

Please refer to sklearn TSNE manual for parameters explanation.

This implementation n_components=2, which is the most common case (use Barnes-Hut t-SNE or sklearn otherwise). Also note that some parameters are there just for the sake of compatibility with sklearn and are otherwise ignored. See MulticoreTSNE class docstring for more info.

Test

You can test it on MNIST dataset with the following command:

cd build/PyFastTsne
python test.py

License

Inherited from original repo's license.

Future work

  • Allow other types than double
  • Improve step 2 performance (possible)

Citation

About

T-Distributed Stochastic Neighbor Embedding Library


Languages

Language:C++ 74.1%Language:CMake 18.5%Language:Python 7.4%