Splitting data needs to be done more efficiently, can loop forever if unlucky or too many labels
jtanderson opened this issue · comments
Joseph Anderson commented
The splitter
method relies on getting lucky in the data split, but can be done much more efficiently:
- Create separate lists of documents for each label
- Split each of these lists event between test/train or whatever is needed