Infinite loop for Impala-generated file
spaztic1215 opened this issue · comments
Hi there,
Was wondering what condition would cause an infinite loop in this while-loop block: https://github.com/jcrobak/parquet-python/blob/master/parquet/__init__.py#L354-L360
Using the following file which we generated from Impala: https://www.dropbox.com/s/kah986gqjt7mrnr/movies.0.parquet at some point where it reads Bytes 65278 -> 112466 it gets stuck in an endless loop b/c the values stop updating. However, we've been able to read smaller Impala-generated files, so not sure if this is a limitation with file size (the file is 100MB+ but there are only 5 columns of data).
Any insight would be hugely appreciated, thanks!
Jenny
Hi Jenny—thanks for the report. The problem seems to be that I haven't implemented support for null values (via definition_levels) for the encoding used by the rating
column on that file.
I should have a fix shortly—I'd like to add some regression tests to ensure this bug doesn't pop up again.