JPFrancoia / aws-data-wrangler

Pandas on AWS

Home Page:https://aws-data-wrangler.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AWS Data Wrangler

Pandas on AWS


NOTE

We just released a new major version 1.0 with breaking changes. Please make sure that all your old projects has dependencies frozen on the desired version (e.g. pip install awswrangler==0.3.2).


AWS Data Wrangler

Release Python Version Code style: black License Checked with mypy Average time to resolve an issue

Coverage Static Checking Documentation Status

Source Downloads Page Installation Command
PyPi PyPI Downloads Link pip install awswrangler
Conda Conda Downloads Link conda install -c conda-forge awswrangler

Quick Start

Install the Wrangler with: pip install awswrangler

import awswrangler as wr
import pandas as pd

df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})

# Storing data on Data Lake
wr.s3.to_parquet(
    df=df,
    path="s3://bucket/dataset/",
    dataset=True,
    database="my_db",
    table="my_table"
)

# Retrieving the data directly from Amazon S3
df = wr.s3.read_parquet("s3://bucket/dataset/", dataset=True)

# Retrieving the data from Amazon Athena
df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")

# Getting Redshift connection (SQLAlchemy) from Glue Catalog Connections
engine = wr.catalog.get_engine("my-redshift-connection")

# Retrieving the data from Amazon Redshift Spectrum
df = wr.db.read_sql_query("SELECT * FROM external_schema.my_table", con=engine)

Read The Docs

About

Pandas on AWS

https://aws-data-wrangler.readthedocs.io

License:Apache License 2.0


Languages

Language:Python 61.3%Language:Jupyter Notebook 37.7%Language:Shell 0.9%Language:Dockerfile 0.1%