zachaller / aws-data-wrangler

Pandas on AWS

Home Page:https://aws-data-wrangler.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AWS Data Wrangler

Pandas on AWS


NOTE

Due the new major version 1.0.0 with breaking changes, please make sure that all your old projects has dependencies frozen on the desired version (e.g. pip install awswrangler==0.3.2).


AWS Data Wrangler

Release Python Version Code style: black License

Checked with mypy Coverage Static Checking Documentation Status

Source Downloads Page Installation Command
PyPi PyPI Downloads Link pip install awswrangler
Conda Conda Downloads Link conda install -c conda-forge awswrangler

Quick Start

Install the Wrangler with: pip install awswrangler

import awswrangler as wr
import pandas as pd

df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})

# Storing data on Data Lake
wr.s3.to_parquet(
    df=df,
    path="s3://bucket/dataset/",
    dataset=True,
    database="my_db",
    table="my_table"
)

# Retrieving the data directly from Amazon S3
df = wr.s3.read_parquet("s3://bucket/dataset/", dataset=True)

# Retrieving the data from Amazon Athena
df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")

# Getting Redshift connection (SQLAlchemy) from Glue Catalog Connections
engine = wr.catalog.get_engine("my-redshift-connection")

# Retrieving the data from Amazon Redshift Spectrum
df = wr.db.read_sql_query("SELECT * FROM external_schema.my_table", con=engine)

About

Pandas on AWS

https://aws-data-wrangler.readthedocs.io

License:Apache License 2.0


Languages

Language:Python 62.3%Language:Jupyter Notebook 36.8%Language:Shell 0.8%Language:Dockerfile 0.1%