ChenglongChen / kaggle-HomeDepot

3rd Place Solution for HomeDepot Product Search Results Relevance Competition on Kaggle.

Home Page:https://www.kaggle.com/c/home-depot-product-search-relevance

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

split_param in class HomedepotSplitter

great-thoughts opened this issue · comments

The class HomedepotSplitter has a split_param=[0.5, 0.25, 0.5]. What do these number represents?
Is it:
split_param[0]: split unique search terms appear in dTrain into 50%train (df0) and 50%validation (df1)
split_param[1]: split the terms appearing in df0 into 25% in train and 75% as common between train and validation
split_param[2]: 50% of the 75% will be appended to the validation dataset?

How do those values relate to the ven diagram?