why do we limit X to be a list of csr_matrix for training ?
yupbank opened this issue · comments
Great question. Because at each stage of the tree we end up re-splitting the dataset, if you give it a sparse matrix Python, will keep having to recreate each of the CSR rows individually. This is incredibly slow and wastes several factors more memory.
I enforce the data to be a list of sparse matrices so we don't have to do a full memory copy to convert it from a csr_matrix to a list of csr matrices.