grzegorzewald / SnowplowRecovery

A repository with Snowplow recovery mechanism for Realtime data pipeline S3 backup files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Snowplow Real-time data pipeline S3 Recovery/data fix

A repository with Snowplow recovery mechanism for Real-time data pipeline S3 backup file.

How to use

Recovery mechanism can either emit base64 encoded records to stdout or write directly to the Kinesis stream.

Backup raw files may be picked up either form S3 bucket (with file name prefix or not) or form a local file (currently only a single one supported).

How to fix bad records?

Line 48 is the answer.

Notes

Thrift schema is taken from official Snowplow repo: collector-payload.thrift

About

A repository with Snowplow recovery mechanism for Realtime data pipeline S3 backup files

License:Apache License 2.0


Languages

Language:Python 69.5%Language:Thrift 30.5%