dvryaboy's repositories

pig

Mirror of Apache Pig

Language:JavaLicense:Apache-2.0Stargazers:18Issues:8Issues:0

idl_storage_guidelines

This document attempts to capture useful patterns and warn about subtle gotchas when it comes to designing and evolving schemas for long-term serialized data. It is not intended as a guide for how to best represent a particular dataset or process.

License:Apache-2.0Stargazers:13Issues:0Issues:1

elephant-bird

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, and HBase code.

Language:JavaLicense:Apache-2.0Stargazers:5Issues:1Issues:0

piglatin-mode

PigLatin mode for Emacs.

Language:Emacs LispStargazers:5Issues:0Issues:0

elephant-twin

Elephant Twin is a framework for creating indexes in Hadoop

Language:JavaLicense:Apache-2.0Stargazers:2Issues:0Issues:0

elephant-twin-lzo

Elephant Twin LZO uses Elephant Twin to create LZO block indexes

Language:JavaLicense:GPL-3.0Stargazers:2Issues:1Issues:0

Vertica-Hadoop-Connector

Vertica Hadoop Connector

Language:JavaLicense:Apache-2.0Stargazers:2Issues:1Issues:0

awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

License:MITStargazers:1Issues:0Issues:0

bud

Prototype Bud runtime (Bloom Under Development)

Language:RubyLicense:NOASSERTIONStargazers:1Issues:1Issues:0

flume

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.

Language:JavaLicense:Apache-2.0Stargazers:1Issues:0Issues:0

giraph

Mirror of Apache Giraph

Language:JavaLicense:Apache-2.0Stargazers:1Issues:0Issues:0

hadoop-lzo

Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20

Language:ShellLicense:GPL-3.0Stargazers:1Issues:0Issues:0

PigEditor

Eclipse plugin for Apache Pig

Language:JavaStargazers:1Issues:0Issues:0

scribe

Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.

Language:C++License:Apache-2.0Stargazers:1Issues:0Issues:0

apache-proposal

Apache Incubator Proposal for Parquet Format

License:Apache-2.0Stargazers:0Issues:0Issues:0

cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.

Language:JavaLicense:NOASSERTIONStargazers:0Issues:0Issues:0

gitbook

The GitBook documentation for Aqueduct

Stargazers:0Issues:0Issues:0

Impatient

source examples to support the "Cascading for the Impatient" blog post series

Language:JavaStargazers:0Issues:0Issues:0

incubator-parquet-format

Mirror of Apache Parquet

Language:JavaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

incubator-parquet-mr

Mirror of Apache Parquet

Language:JavaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

lakeFS

lakeFS - Data version control for your data lake | Git for data

License:Apache-2.0Stargazers:0Issues:0Issues:0

MassQueryLanguage

The Mass Spec Query Language (MassQL) is a domain specific language meant to be a succinct way to express a query in a mass spectrometry centric fashion.

License:MITStargazers:0Issues:0Issues:0

parquet-format-1

As we are moving to Apache, please open your pull requests on: https://github.com/apache/incubator-parquet-format

Language:JavaLicense:Apache-2.0Stargazers:0Issues:2Issues:0

pdi-google-spreadsheet-plugin

Plugin for Pentaho Data Integration allowing reading and writing of Google Spreadsheets

Language:JavaLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

redelm

an anagram

Language:JavaStargazers:0Issues:0Issues:0

scalding

A Scala API for Cascading

Language:ScalaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

semantic-versioning

Java library relying on semver.org principles to check binary code compatibility

Language:JavaLicense:Apache-2.0Stargazers:0Issues:0Issues:0