Python wrapper for csv.reader
that can process files in predefined chunks.
This library allows a user to partition a filestream into partitions of a predefined size. It was initially motivated by the need to process large CSV files from AWS S3 while keeping application code clean.
The package is available on PyPI:
python -m pip install chunksv
The library can be imported and used as follows:
import chunksv
with open("file.csv", "r") as f:
rows = chunksv.reader(
f,
max_bytes=<size of each partition>,
header=[<optional columns list>]
)
When the reader
object has consumed enough rows to reach the max_bytes
limit, it will raise StopIteration
. To
consume more rows from the input stream, call reader.resume()
:
while not rows.empty:
current_partition = [r for r in rows]
< process partition here >
rows.resume()