jtanderson / BIOME-z-Project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Splitting data needs to be done more efficiently, can loop forever if unlucky or too many labels

jtanderson opened this issue · comments

The splitter method relies on getting lucky in the data split, but can be done much more efficiently:

  1. Create separate lists of documents for each label
  2. Split each of these lists event between test/train or whatever is needed