linkedin / photon-ml

A scalable machine learning library on Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The validateDataFrame function pulls entire dataset into Driver

joshvfleming opened this issue · comments

The following code attempts to pull the entire dataset into the Driver before running validations:

We should instead map over the DataFrame rows so that the validations run in a distributed fasion, and then collect the results.

Resolve by #307