brettvanderwerff / auto_sql

auto_sql is a memory aware csv to sqlite converter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

auto_sql

Build Status

==Work in progress==

auto_sql is very early in development, more features to come

Description

auto_sql is a memory aware csv to sqlite converter capable of converting multi-gigabyte tabular files to sqlite databases on low memory machines. auto_sql focuses on speed by enabling multi-processing on multi-core machines.

Installation

$pip install auto_sql

Usage Case

from auto_sql import AutoSql

tab_obj = AutoSql(file='file.csv',
                        db_name='database',
                        sep='\t',
                        out_dir=".")

if __name__ == "__main__":
    tab_obj.run()

Currently auto_sql only supports csv's with headers

Tuning

The buffer parameter can be reduced from it's default .3 value to avoid memory errors. Conversely the buffer parameter can be increased to gain speed at the increased risk of a memory error.

from auto_sql import AutoSql

tab_obj = AutoSql(file='file.csv',
                        db_name='database',
                        sep='\t',
                        out_dir=".",
                        buffer=.1)

if __name__ == "__main__":
    tab_obj.run()

Dependencies

  • Python 3.4, 3.5, or 3.6

  • pandas==0.21.1

  • psutil==5.4.7

About

auto_sql is a memory aware csv to sqlite converter

License:GNU General Public License v3.0


Languages

Language:Python 100.0%