streamtos3
is a utility to stream data (mainly from stdin) into an S3-bucket.
Surprisingly, the official aws-cli does not support piping data into S3-storage.
In order to be able to obey to the Unix philosophy and make use of the pipelining-paradigm, two tools, namely s3-echoer and 2s3 have been built. While the former is written in Go, the latter was written in Python. Unfortunately it is 8 years old, unmaintained and still using Python2, which reached its end of life, and an old boto-version. Therefore, I decided to recreate a similar utility, named streamtos3
using Python3 and the AWS SDK for Python boto3.
You may wonder, why someone would want to use that and consider the great aws-cli insufficient in that regard. This utitility is born from the need to handle situations where you do not want to write to the disk, for example during forensic investigations of a live system (or space reasons on tiny cloud instances as well).
To protect the integrity of the uploaded data, a three-fold approach is implemented:
- An MD5-hash is calculated for each chunk of data this is supplied as ContentMD5 to the
upload_part()
-call, so that the AWS-endpoint checks the integrity of the received part. - The ETag-checksum wich is returned as response to an
upload_part()
-call, is compared to the originally calculated MD5-hash. If the checksums do not match, the part is uploaded again, in totalretry
times in an interval ofsecs
seconds before the upload fails. - After completing the upload, the overall ETag of the object in the S3-bucket is retrieved and compared to the “local” ETag which is constructed by hashing the concatenation the MD5-sums of the individual parts, which is done by the S3-endpoint remotely as well.
By doing so, we can be fairly sure, that we stored the data, which was actually read.
usage: streamtos3.py [-h] -k KEYFILE -b BUCKET -o OBJ [-c CHUNKSIZE] [-s SECS_WAIT] [-r RETRY] [-d] [infile] Store data from stdin into S3 bucket DISCLAIMER: This is an _unvalidated_ proof of concept, use at _your own risk_! positional arguments: infile Filepath or '-' for stdin optional arguments: -h, --help show this help message and exit -k KEYFILE, --keyfile KEYFILE File that contains: <AWS_KEY>:<AWS_SECRET> for S3 access. -b BUCKET, --bucket BUCKET Name of the target bucket. -o OBJ, --obj OBJ Name of the object to write -c CHUNKSIZE, --chunksize CHUNKSIZE Split upload in CHUNK_SIZE bytes. (Default: 8 MiB) -s SECS_WAIT, --secs-wait SECS_WAIT Time in seconds to wait until retry upload a failed part again (Default: 5) -r RETRY, --retry RETRY Retries until giving up (Default: 5) -d, --debug Print debug information
stream-to-s3
depends on the AWS SDK for Python boto3 which itself has the following dependencies.
- botocore
- jmespath
- python-dateutil
- s3transfer
- six
- urllib3
Please refer to the requirements.txt
file for details on the versions.
Currently a setup.py
is not provided, in order to use the tool, setup a venv
and install the requirements:
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
python3 streamtos3.py -h