Refefer / fastxml

FastXML / PFastXML / PFastreXML - Implementation of Extreme Multi-label Classification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

why do we limit X to be a list of csr_matrix for training ?

yupbank opened this issue · comments

Great question. Because at each stage of the tree we end up re-splitting the dataset, if you give it a sparse matrix Python, will keep having to recreate each of the CSR rows individually. This is incredibly slow and wastes several factors more memory.

I enforce the data to be a list of sparse matrices so we don't have to do a full memory copy to convert it from a csr_matrix to a list of csr matrices.