lukeplausin / s3du

du - like utility for Amazon's S3 object storage service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

s3du

du - like utility for Amazon's S3 object storage service

What does it do?

This utility counts disk space in S3 by path (and also optionally by storage tier).

Why should I use this s3du instead of all the other ones?

  • Uses the low level client and paging to cut the number of requests to S3, retrieving answers in a fast an inexpensive way.

  • Can analyse usage from S3 inventory reports very quickly instead of using the list API.

  • Uses streaming so that the memory footprint is reasonable even, for huge buckets containing billions of small objects.

  • Has been packaged as a module for easy installation with pip, and with a cli entrypoint set up.

  • Can be imported into other python projects:

from s3du import s3_disk_usage

s3_disk_usage(bucket="my_bucket")

Installation

pip install git+https://github.com/lukeplausin/s3du.git

Usage

# Print help
s3du --help

# List common prefixes and object size totals in 'my-bucket', to a depth
# of two objects, under the 'my/file/location' prefix
s3du --bucket my-bucket --prefix my/file/location --depth 2

# Print entire output to screen in human readable format, and save data
# into a file called contents.json
s3du --bucket my-bucket --file contents.json --human

# Analyse a bucket using a delimiter other than the default /
s3du --bucket my-bucket --delimiter '\'

# Analyse S3 usage using an inventory report instead of the list API
s3du --prefix my/file/location --depth 2 --inventory-url 's3://my-report-bucket/s3-inventories/my-data-bucket/my-report/2019-10-28T04-00Z/'

The format will look like this, the fields are size in bytes (b), number of files (N), file name, oldest file date (O), newest file date (N):

$ s3du --bucket my-bucket --depth 1
b:         59516837 N:           567                                                        dist/   O: 2019-07-02 N: 2019-07-02
b:           170032 N:             1                                            bootstrap.min.css   O: 2019-07-02 N: 2019-07-02
b:            58072 N:             1                                             bootstrap.min.js   O: 2019-07-02 N: 2019-07-02
b:             1262 N:             1                                                   index.html   O: 2019-07-02 N: 2019-07-02
b:             9610 N:             1                                                      list.js   O: 2019-07-02 N: 2019-07-02
b:         59755813 N:           571                                                            .   O: 2019-07-02 N: 2019-07-02

To customise the format, import the module into your code.

Future improvements

  • Use multiple threads to count file sizes, cutting down waiting times when working with huge buckets
  • Sort by size, number of objects, projected cost

About

du - like utility for Amazon's S3 object storage service


Languages

Language:Python 100.0%