datamindedbe / conveyor-templates

Cookiecutter templates used by Conveyor.

Home Page:https://conveyordata.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[pyspark] assert_frame_equal_with_sort() broken for complex types

vlieven opened this issue · comments

Describe the bug
The function assert_frame_equal_with_sort() provided in /tests/common/spark.py in the PySpark template cannot deal with complex types, and will throw an error related to pyspark <-> pandas conversion. I noticed this while writing a test for a DataFrame containing an ArrayType, but I suspect MapType and StructType will trigger the same kind of error.

To Reproduce
Steps to reproduce the behavior:

  1. Perform assert_frame_equal_with_sort() on a DataFrame containing an ArrayType column.

Expected behavior
The assertion properly tests the equality of the two DataFrames instead of throwing an error.

Desktop (please complete the following information):

  • OS: MacOS Big Sur
  • Datafy Version: 0.37

Additional context
I fixed my issue by replacing the call to assert_frame_equal_with_sort() by a call to a function defined by the chispa library. This seems to work, but I don't know if the templates should be opinionated about which flavour of spark test helpers to use.

We provide this more as example I believe but it might be interesting to help users with finding a test setup that keeps on working if you are doing some more real work :)
So feel free to add it