Structures cause error
SergeNov opened this issue · comments
Hi Joe and others
I am trying to use your module to read a parquet file, and i ran into a problem here:
schema.py, line 21:
assert len(self.schema_elements) == len(self.schema_elements_by_name)
Apparently the init method assumes that my structure has multiple fields with the same name. Module works correctly if you comment out this line though
Originally these files were used by Hive, and here is the list of fields in the table:
fileid bigint,
version bigint,
ip_geocode structcountrycode:string,regionname:string,city:string,postalcode:string,metrocode:string,dmacode:string,
timestamp bigint,
region bigint,
pixel bigint,
uuid bigint,
uuid_exists boolean,
referingurl string,
useragent string,
ip string,
querystring string,
campaignsinfo array<struct<campaign_id:bigint,media_types:array,advertiser_id:bigint,funnel_step_id:bigint,funnel_step_value:bigint,track_conversion:boolean>>,
opted_out boolean,
event_id string
Here is how the list of fields that the module sees:
name=u'hive_schema', field_id=None, repetition_type=None, type_length=None, precision=None, num_children=17, converted_type=None, type=None
name=u'fileid', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'version', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'ip_geocode', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=6, converted_type=None, type=None
name=u'countrycode', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'regionname', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'city', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'postalcode', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'metrocode', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'dmacode', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'timestamp', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'region', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'pixel', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'uuid', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'uuid_exists', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=0
name=u'referingurl', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'useragent', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'ip', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'querystring', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'campaignsinfo', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=1, converted_type=3, type=None
name=u'bag', field_id=None, repetition_type=2, type_length=None, precision=None, num_children=1, converted_type=None, type=None
name=u'array_element', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=6, converted_type=None, type=None
name=u'campaign_id', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'media_types', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=1, converted_type=3, type=None
name=u'bag', field_id=None, repetition_type=2, type_length=None, precision=None, num_children=1, converted_type=None, type=None
name=u'array_element', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'advertiser_id', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'funnel_step_id', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'funnel_step_value', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'track_conversion', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=0
name=u'opted_out', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=0
name=u'event_id', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'dt', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=1
name=u'hr', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=1
Apparently there are 2 elements named 'array_element' and 'bag' - i assume these fields just come with structures
@SergeNov thanks for the report. I'll attempt to reproduce and fix the issue.
Still experiencing this issue in version 1.3.1.