s3du

du - like utility for Amazon's S3 object storage service

What does it do?

This utility counts disk space in S3 by path (and also optionally by storage tier).

Why should I use this s3du instead of all the other ones?

Uses the low level client and paging to cut the number of requests to S3, retrieving answers in a fast an inexpensive way.
Can analyse usage from S3 inventory reports very quickly instead of using the list API.
Uses streaming so that the memory footprint is reasonable even, for huge buckets containing billions of small objects.
Has been packaged as a module for easy installation with pip, and with a cli entrypoint set up.
Can be imported into other python projects:

from s3du import s3_disk_usage

s3_disk_usage(bucket="my_bucket")

Installation

pip install git+https://github.com/lukeplausin/s3du.git

Usage

# Print help
s3du --help

# List common prefixes and object size totals in 'my-bucket', to a depth
# of two objects, under the 'my/file/location' prefix
s3du --bucket my-bucket --prefix my/file/location --depth 2

# Print entire output to screen in human readable format, and save data
# into a file called contents.json
s3du --bucket my-bucket --file contents.json --human

# Analyse a bucket using a delimiter other than the default /
s3du --bucket my-bucket --delimiter '\'

# Analyse S3 usage using an inventory report instead of the list API
s3du --prefix my/file/location --depth 2 --inventory-url 's3://my-report-bucket/s3-inventories/my-data-bucket/my-report/2019-10-28T04-00Z/'

The format will look like this, the fields are size in bytes (b), number of files (N), file name, oldest file date (O), newest file date (N):

$ s3du --bucket my-bucket --depth 1
b:         59516837 N:           567                                                        dist/   O: 2019-07-02 N: 2019-07-02
b:           170032 N:             1                                            bootstrap.min.css   O: 2019-07-02 N: 2019-07-02
b:            58072 N:             1                                             bootstrap.min.js   O: 2019-07-02 N: 2019-07-02
b:             1262 N:             1                                                   index.html   O: 2019-07-02 N: 2019-07-02
b:             9610 N:             1                                                      list.js   O: 2019-07-02 N: 2019-07-02
b:         59755813 N:           571                                                            .   O: 2019-07-02 N: 2019-07-02

To customise the format, import the module into your code.

Future improvements

Use multiple threads to count file sizes, cutting down waiting times when working with huge buckets
Sort by size, number of objects, projected cost

lukeplausin / s3du