Welcome to Impala

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.

Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:

Best of breed performance and scalability.
Support for data stored in HDFS, Apache HBase and Amazon S3.
Wide analytic SQL support, including window functions and subqueries.
On-the-fly code generation using LLVM to generate CPU-efficient code tailored specifically to each individual query.
Support for the most commonly-used Hadoop file formats, including the Apache Parquet (incubating) project.
Apache-licensed, 100% open source.

More about Impala

To learn more about Impala as a business user, or to try Impala live or in a VM, please visit the Impala homepage.

If you are interested in contributing to Impala as a developer, or learning more about Impala's internals and architecture, visit the Impala wiki.

Supported Platforms

Impala only supports Linux at the moment.

Build Instructions

./buildall.sh -notests

About

Real-time Query for Hadoop

http://impala.io

Apache License 2.0

Languages

Language:C++ 49.8%Language:Java 23.5%Language:Python 13.4%Language:JavaScript 6.6%Language:C 2.3%Language:Shell 1.3%Language:Thrift 1.3%Language:CSS 0.8%Language:CMake 0.8%Language:Lex 0.1%Language:PLpgSQL 0.0%Language:Groff 0.0%Language:Objective-C 0.0%Language:Protocol Buffer 0.0%Language:HTML 0.0%