Example Code
pjakobsen opened this issue · comments
It would be great to have some example code to show how to use this interesting library. Trying to tease it out from the test cases has proven to be unsuccessful so far.
Hello, @pjakobsen! I'm having the same problem here: I'm implementing the Parquet rows plugin and needed to read the parquet-python
's source code to know how to use it -- it's a difficult, non-pythonic way. So I've created a little helper function which can also help you:
from collections import namedtuple
import parquet
OPTIONS = namedtuple('Options', ['col', 'format'])(col=None, format='custom')
def import_data(filename):
data, field_names = parquet.dump(filename, OPTIONS, lambda *args: args)
length = len(data[field_names[0]])
return [{field_name: data[field_name][index] for field_name in field_names}
for index in range(length)]
The function is pretty straighforward to use, for example, this code:
parquet_rows = import_data('test-data/nation.dict.parquet')
for row in parquet_rows:
print row
Will generate the following output (each row is a Python dict
):
{'region_key': 0, 'nation_key': 0, 'name': 'ALGERIA', 'comment_col': ' haggle. carefully final deposits detect slyly agai'}
{'region_key': 1, 'nation_key': 1, 'name': 'ARGENTINA', 'comment_col': 'al foxes promise slyly according to the regular accounts. bold requests alon'}
(... 24 more rows ...)
@pjakobsen, you can now use my library rows to read and convert parquet files! :) More information on this blog post.
I took a try for a more pythonic API in #11. PTAL!