Kensuke-Hinata / polars

Fast multi-threaded DataFrame library in Rust | Python | Node.js

Home Page:https://pola.rs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Polars

rust docs Build and test PyPI Latest Release NPM Latest Release

Python Documentation | Rust Documentation | User Guide | Discord | StackOverflow

Blazingly fast DataFrames in Rust, Python & Node.js

Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.

  • Lazy | eager execution
  • Multi-threaded
  • SIMD
  • Query optimization
  • Powerful expression API
  • Rust | Python | ...

To learn more, read the User Guide.

>>> import polars as pl
>>> df = pl.DataFrame(
...     {
...         "A": [1, 2, 3, 4, 5],
...         "fruits": ["banana", "banana", "apple", "apple", "banana"],
...         "B": [5, 4, 3, 2, 1],
...         "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
...     }
... )

# embarrassingly parallel execution
# very expressive query language
>>> (
...     df
...     .sort("fruits")
...     .select(
...         [
...             "fruits",
...             "cars",
...             pl.lit("fruits").alias("literal_string_fruits"),
...             pl.col("B").filter(pl.col("cars") == "beetle").sum(),
...             pl.col("A").filter(pl.col("B") > 2).sum().over("cars").alias("sum_A_by_cars"),     # groups by "cars"
...             pl.col("A").sum().over("fruits").alias("sum_A_by_fruits"),                         # groups by "fruits"
...             pl.col("A").reverse().over("fruits").flatten().alias("rev_A_by_fruits"),           # groups by "fruits
...             pl.col("A").sort_by("B").over("fruits").flatten().alias("sort_A_by_B_by_fruits"),  # groups by "fruits"
...         ]
...     )
... )
shape: (5, 8)
┌──────────┬──────────┬──────────────┬─────┬─────────────┬─────────────┬─────────────┬─────────────┐
│ fruitscarsliteral_striBsum_A_by_casum_A_by_frrev_A_by_frsort_A_by_B │
│ ------ng_fruits---rsuitsuits_by_fruits  │
│ strstr---i64------------         │
│          ┆          ┆ str          ┆     ┆ i64i64i64i64         │
╞══════════╪══════════╪══════════════╪═════╪═════════════╪═════════════╪═════════════╪═════════════╡
│ "apple""beetle""fruits"114744           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "apple""beetle""fruits"114733           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana""beetle""fruits"114855           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana""audi""fruits"112822           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana""beetle""fruits"114811           │
└──────────┴──────────┴──────────────┴─────┴─────────────┴─────────────┴─────────────┴─────────────┘

Performance 🚀🚀

Polars is very fast, and in fact is one of the best performing solutions available. See the results in h2oai's db-benchmark.

Python setup

Install the latest polars version with:

$ pip3 install polars

Update existing polars installation to the lastest version with:

$ pip3 install -U polars

Releases happen quite often (weekly / every few days) at the moment, so updating polars regularily to get the latest bugfixes / features might not be a bad idea.

Rust setup

You can take latest release from crates.io, or if you want to use the latest features / performance improvements point to the master branch of this repo.

polars = { git = "https://github.com/pola-rs/polars", rev = "<optional git tag>" }

Rust version

Required Rust version >=1.58

Documentation

Want to know about all the features Polars supports? Read the docs!

Python

Rust

Node

Contribution

Want to contribute? Read our contribution guideline.

[Python]: compile polars from source

If you want a bleeding edge release or maximal performance you should compile polars from source.

This can be done by going through the following steps in sequence:

  1. Install the latest Rust compiler
  2. Install maturin: $ pip3 install maturin
  3. Choose any of:
    • Fastest binary, very long compile times:
      $ cd py-polars && maturin develop --rustc-extra-args="-C target-cpu=native" --release
    • Fast binary, Shorter compile times:
      $ cd py-polars && maturin develop --rustc-extra-args="-C codegen-units=16 -C lto=thin -C target-cpu=native" --release

Note that the Rust crate implementing the Python bindings is called py-polars to distinguish from the wrapped Rust crate polars itself. However, both the Python package and the Python module are named polars, so you can pip install polars and import polars.

Arrow2

Polars has transitioned to arrow2. Arrow2 is a faster and safer implementation of the Apache Arrow Columnar Format. Arrow2 also has a more granular code base, helping to reduce the compiler bloat.

Acknowledgements

Development of Polars is proudly powered by

Xomnia

Sponsors

About

Fast multi-threaded DataFrame library in Rust | Python | Node.js

https://pola.rs/

License:MIT License


Languages

Language:Rust 72.0%Language:Python 18.0%Language:TypeScript 9.6%Language:JavaScript 0.2%Language:Makefile 0.1%Language:R 0.0%Language:CSS 0.0%Language:Shell 0.0%