Example Code

Question

Example Code

pjakobsen opened this issue 9 years ago · comments

It would be great to have some example code to show how to use this interesting library. Trying to tease it out from the test cases has proven to be unsuccessful so far.

Álvaro Justen · Answer 1 · Mon Mar 14 2016 11:28:34 GMT+0800 (China Standard Time)

Hello, @pjakobsen! I'm having the same problem here: I'm implementing the Parquet rows plugin and needed to read the parquet-python's source code to know how to use it -- it's a difficult, non-pythonic way. So I've created a little helper function which can also help you:

from collections import namedtuple
import parquet

OPTIONS = namedtuple('Options', ['col', 'format'])(col=None, format='custom')

def import_data(filename):
    data, field_names = parquet.dump(filename, OPTIONS, lambda *args: args)
    length = len(data[field_names[0]])
    return [{field_name: data[field_name][index] for field_name in field_names}
            for index in range(length)]

The function is pretty straighforward to use, for example, this code:

parquet_rows = import_data('test-data/nation.dict.parquet')
for row in parquet_rows:
    print row

Will generate the following output (each row is a Python dict):

{'region_key': 0, 'nation_key': 0, 'name': 'ALGERIA', 'comment_col': ' haggle. carefully final deposits detect slyly agai'}
{'region_key': 1, 'nation_key': 1, 'name': 'ARGENTINA', 'comment_col': 'al foxes promise slyly according to the regular accounts. bold requests alon'}
(... 24 more rows ...)

Álvaro Justen · Answer 2 · Mon Mar 14 2016 13:39:19 GMT+0800 (China Standard Time)

@pjakobsen, you can now use my library rows to read and convert parquet files! :) More information on this blog post.

Joe Crobak · Answer 3 · Tue Mar 22 2016 08:40:00 GMT+0800 (China Standard Time)

I took a try for a more pythonic API in #11. PTAL!