marcroiglama / pandas_engineering

When we are working large files with pandas library we can suffer from memory errors or slow processing as Pandas is a very powerful tool but very memory consuming in terms of RAM. On this git I present a simple way to reduce the memory overload of pandas dataframes using pandas formatting and some transformations.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pandas_engineering

When we are working large files with pandas library we can suffer from memory errors or slow processing. Pandas is a very powerful tool but very memory consuming in terms of RAM if we don't preprocess a bit the original dataframe. On this git I present a simple way to reduce the memory overload of dataframes using pandas functions and tools.

Given a dataframe of 1684.11 MB the memory overload is reduce it untill 128 MB! Find the stop_times.txt file on: https://transitfeeds.com/p/helsinki-regional-transport/735/20190111

About

When we are working large files with pandas library we can suffer from memory errors or slow processing as Pandas is a very powerful tool but very memory consuming in terms of RAM. On this git I present a simple way to reduce the memory overload of pandas dataframes using pandas formatting and some transformations.

License:Apache License 2.0


Languages

Language:Jupyter Notebook 100.0%