oleg-kachan / smote_parameters

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PARAMETER SEARCH
----------------
Script works as follows
- for each k breaks data to 5 splits of 80/20 ratio
- 20% of point clouds in each split are oversampled to match the cardinality of 80% (i.e. x4)
- the Wasserstein distance over splits is computed for each k, ranging in np.linspace(k_min, k_max, k_num)
- the resulting matrix of shape (5,k_num) is saved to ./data folder on host machine

BUILD IMAGE
-----------
To build an image exectute in the directory with Dockerfile:

$ docker build -t smote_parameters .
$ docker build -t smote_parameters2 -f Dockerfile2 .

RUN CONTAINER
-------------
- prepare landmarks numpy .npy files in the format (n_landmarks, n_points, n_dims) in the ./data folder
- the resulting data -- will be also saved in the ./data folder in a .npy file, with a name
  containing date, time and the parameters used to run script, for example:
  '15-05-2021_14-25-08_simplicial_maximal_cknn_kmin_3_kmax_12_knum_4_delta_1.0_r_0.npy'

To run a container execute:

$ docker run -d --rm -v "$(pwd)"/data:/exp/data --name okachan_smote_parameters_detached smote_parameters -i ./data/w300.npy -k_min 3 -k_max 12 -k_num 4 -j 4 -n 200

$ docker run -d --rm -v "$(pwd)"/data:/exp/data --name okachan_smote_parameters2_simplicial_knn smote_parameters2 -i ./data/w300.npy -m simplicial -g knn -k_min 3 -k_max 45 -k_num 15 -j 15

-i        name of the input file
-m        method, eigher 'simplicial' or 'simplicial_maximal', if omitted 'simplicial_maximal'
-g        neighborgood graph 'knn' or 'cknn', if omitted 'cknn'
-n        number of point clouds to consider, if omitted consider all point clouds
-k_min    lower bound for k_nearest neighbors
-k_max    upper bound for k_nearest neighbors (inclusive)
-k_num    number of steps for the np.linspace(k_min, k_max, k_num) function
-d        delta parameter, controlling the number of all balls, only used in ckNN graph
-r        random state, 0 if omitted
-j        number of jobs, 4 if omitted, select as the number of processors available, upper bounded by k_num

Container could be run interactively, giving the access to the command prompt within container:

$ docker run -it -v "$(pwd)"/data:/exp/data smote_parameters2

PARAMETERS TO PASS
------------------
Example 1

Input    ./data/w300.npy
Method   simplicial_maximal
Graph    knn
k        ~np.linspace(3, 30, 28) - every Z from 3 to 30
j        28, upper bounded by k_num

$ python3.9 parameter_search.py -i ./data/w300.npy -m simplicial_maximal -g knn -k_min 3 -k_max 30 -k_num 28 -j 28

Example 2

Input    ./data/w300.npy
Method   simplicial
Graph    cknn
k        ~np.linspace(3, 30, 28) - every Z from 3 to 30
d        1.05, usually it is meaningful to check small vicinity around 1.0, think of 0.9 - 1.2 for example
j        28, upper bounded by k_num

$ python3.9 parameter_search.py -i ./data/w300.npy -m simplicial -g cknn -k_min 3 -k_max 30 -k_num 28 -d 1.05 -j 28

One could run the script on first n points clouds by specifying -n key, may be useful for performance tests:

$ python3.9 parameter_search2.py -i ./data/w300.npy -n 100 -k_min 3 -k_max 12 -k_num 4 -j 4

PARAMETERS HELP
---------------
$ python3.9 parameter_search.py --help

About


Languages

Language:Python 97.0%Language:Dockerfile 3.0%