hortonworks / hive-json

A rough prototype of a tool for discovering Apache Hive schemas from JSON documents.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hive JSON Schema Finder

This project is a rough prototype that I've written to analyze large collections of JSON documents and discover their Apache Hive schema. I've used it to anaylyze the githubarchive.org's log data.

To build the project, use Maven (3.0.x) from http://maven.apache.org/.

Building the jar:

% mvn package

Run the program:

% bin/find-json-schema *.json.gz

I've uploaded the discovered schema for githubarchive.org to https://gist.github.com/omalley/5125691.

About

A rough prototype of a tool for discovering Apache Hive schemas from JSON documents.


Languages

Language:Java 96.8%Language:Python 3.2%