acroz / pylivy

A Python client for Apache Livy, enabling use of remote Apache Spark clusters.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No clean way to return empty dataframe

ishmandoo opened this issue · comments

Right now the only way to return an empty dataset or null result is to construct an empty Spark frame. This is kind of clunky to do.

Might it make sense to change session.read to work on a variable set to None and interpret it as an empty dataframe?

Hi Ben, thanks for the suggestion!

Could you help me to understand your use case a little better? It's not clear to me in what situation you'd have a variable in a (presumably PySpark) session set to None and wish that to be interpreted as an empty DataFrame when you attempt to download it. In this suggestion, the differentiation between None (no dataframe at all) and an empty result would be lost, which seems valuable to keep.

My understanding is that trying to read a variable whose value is None will result in an error. For my application, I sometimes want to return a null result that will be interpreted as an empty dataframe. Right now I'm building an empty spark dataframe to return like spark.createDataFrame([], T.StructType([])). I was hoping to avoid having to do that.