AWS Data Wrangler

Pandas on AWS

NOTE

Due the new major version 1.0.0 with breaking changes, please make sure that all your old projects has dependencies frozen on the desired version (e.g. pip install awswrangler==0.3.2).

Source	Downloads	Page	Installation Command
PyPi		Link	`pip install awswrangler`
Conda		Link	`conda install -c conda-forge awswrangler`

Quick Start

Install the Wrangler with: pip install awswrangler

import awswrangler as wr
import pandas as pd

df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})

# Storing data on Data Lake
wr.s3.to_parquet(
    df=df,
    path="s3://bucket/dataset/",
    dataset=True,
    database="my_db",
    table="my_table"
)

# Retrieving the data directly from Amazon S3
df = wr.s3.read_parquet("s3://bucket/dataset/", dataset=True)

# Retrieving the data from Amazon Athena
df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")

# Getting Redshift connection (SQLAlchemy) from Glue Catalog Connections
engine = wr.catalog.get_engine("my-redshift-connection")

# Retrieving the data from Amazon Redshift Spectrum
df = wr.db.read_sql_query("SELECT * FROM external_schema.my_table", con=engine)

Read The Docs

About

Pandas on AWS

https://aws-data-wrangler.readthedocs.io

Apache License 2.0

Languages

Language:Python 62.3%Language:Jupyter Notebook 36.8%Language:Shell 0.8%Language:Dockerfile 0.1%