nongli / arrow

Apache Arrow is a columnar in-memory analytics layer designed to accelerate big data. It houses a set of canonical in-memory representations of flat and hierarchical data along with multiple language-bindings for structure manipulation. It also provides IPC and common algorithm implementations.

Home Page:https://arrow.apache.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Apache Arrow

Build Status travis build status

Powering Columnar In-Memory Analytics

Arrow is a set of technologies that enable big-data systems to process and move data fast.

Initial implementations include:

Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.

What's in the Arrow libraries?

The reference Arrow implementations contain a number of distinct software components:

  • Columnar vector and table-like containers (similar to data frames) supporting flat or nested types
  • Fast, language agnostic metadata messaging layer (using Google's Flatbuffers library)
  • Reference-counted off-heap buffer memory management, for zero-copy memory sharing and handling memory-mapped files
  • Low-overhead IO interfaces to files on disk, HDFS (C++ only)
  • Self-describing binary wire formats (streaming and batch/file-like) for remote procedure calls (RPC) and interprocess communication (IPC)
  • Integration tests for verifying binary compatibility between the implementations (e.g. sending data from Java to C++)
  • Conversions to and from other in-memory data structures (e.g. Python's pandas library)

Getting involved

Right now the primary audience for Apache Arrow are the developers of data systems; most people will use Apache Arrow indirectly through systems that use it for internal data handling and interoperating with other Arrow-enabled systems.

Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved:

About

Apache Arrow is a columnar in-memory analytics layer designed to accelerate big data. It houses a set of canonical in-memory representations of flat and hierarchical data along with multiple language-bindings for structure manipulation. It also provides IPC and common algorithm implementations.

https://arrow.apache.org

License:Apache License 2.0


Languages

Language:Java 42.7%Language:C++ 34.1%Language:Python 14.4%Language:CMake 6.3%Language:Shell 1.3%Language:C 0.5%Language:Makefile 0.4%Language:Perl 0.2%Language:FreeMarker 0.1%