avinassh / fast-sqlite3-inserts

Some bunch of test scripts to generate a SQLite DB with 1B rows in fastest possible way

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fast SQLite Inserts

To find out the fastest way to create an SQLite DB with one billion random rows.

Read this blog post for the more context - Towards Inserting One Billion Rows in SQLite Under A Minute

Leaderboard

(for 100M insertions)

Variant Time
Rust 23 seconds
PyPy 126 seconds
CPython 210 seconds

Current Benchmark

Python

These are the current fastest CPython and PyPy numbers.

$ ./bench.sh

Sat May  8 19:42:44 IST 2021 [PYTHON] running sqlite3_opt_batched.py (100_000_000) inserts
      517.53 real       508.24 user         7.35 sys

Sat May  8 20:03:04 IST 2021 [PYPY] running sqlite3_opt_batched.py (100_000_000) inserts
      159.70 real       153.46 user         5.81 sys

Rust

These are the current fastest Rust numbers

Mon Nov 22 18:47:26 IST 2021 [RUST] basic_batched.rs (100_000_000) inserts

real   0m23.826s
user   0m21.685s
sys    0m2.057s

Mon Nov 22 18:47:50 IST 2021 [RUST] threaded_batched.rs (100_000_000) inserts

real   0m23.070s
user   0m27.512s
sys    0m2.465s

In Memory

Instead of writing to disk, I used a :memory: DB, these are the numbers

Mon May 10 17:40:39 IST 2021 [RUST] basic_batched.rs (100_000_000) inserts
       31.38 real        30.55 user         0.56 sys

Mon May 10 17:39:39 IST 2021 [RUST] threaded_batched.rs (100_000_000) inserts
       28.94 real        45.02 user         2.03 sys

Busy loop time

The amount of time these scripts were taking in just to run the for loops (and no SQL insertion)

$ ./busy.sh

Sun May  9 13:16:01 IST 2021 [PYTHON] busy_loop.py (100_000_000) iterations
      351.14 real       347.53 user         3.39 sys

Sun May  9 13:21:52 IST 2021 [PYPY] busy_loop.py (100_000_000) iterations
       81.58 real        77.73 user         3.80 sys

Sun May  9 13:23:14 IST 2021 [RUST] busy.rs (100_000_000) iterations
       17.97 real        16.29 user         1.67 sys

Sun May  9 13:23:32 IST 2021 [RUST] threaded_busy.rs (100_000_000) iterations
        7.18 real        42.52 user         7.20 sys

Community Contributions

PR Author Result
#2 captn3m0 Reduced the CPython running time by half (from 7.5 minutes to 3.5 minute)
#12 red15 saved 2s from Rust's running time (bringing it to 30s)
#19 kerollmops saved 5s from Rust's running time (bringing it to 23s)

Contributing

All contributions are welcome. If you have any ideas on increasing the performance, feel free to submit a PR. You may also check the current open issues to work on.

License

Released under MIT License. Check LICENSE file more info.

About

Some bunch of test scripts to generate a SQLite DB with 1B rows in fastest possible way

License:MIT License


Languages

Language:Rust 60.0%Language:Python 25.4%Language:Shell 12.7%Language:Makefile 2.0%