dmlc / rabit

Reliable Allreduce and Broadcast Interface for distributed machine learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Eliminate extra dataset copy in Python.

trivialfis opened this issue · comments

During construction of DMatrix, Python wrapper might duplicate the dataset because it's not continuous or not of the right data type. But we can handle these situations inside c++ code to avoid constructing an extra copy of dataset.

My goal is not to share the underlying buffer with Python data structure, but to eliminate the extra copies constructed during conversion inside Python wrapper.

Wrong list, sorry~~