HelikarLab / candis

:ribbon: A data mining suite for gene expression data.

Home Page:http://candis.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add a feature to split data into training and testing datasets.

rupav opened this issue · comments

Expected Behaviour

After splitting data with Percent feature, following thing will happen:

1).User enters the split percentage.
2). Candis firstly randomize the provided data.
3). Then split the data into training and testing dataset with given percentage/ratio.
3). Both datasets gets saved in the user data directory.
4). Training dataset will then be trained similarly how it is currently done on Candis.
5). Then user need to test against the testing dataset using yet to be implemented Predict option on candis.

Actual Behaviour

Currently what is happening, user provides data, candis split the data itself using K-Folds technique, runs the pipeline, and give the model accuracy against the data/CEL provided by the user.