AppThreat / vulnerability-db

Vulnerability database and package search for sources such as Linux, OSV, NVD, GitHub and npm. Powered by sqlite, CVE 5.0, purl, and vers.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[v6] investigate sqlite3 db

prabhu opened this issue · comments

I am currently investigating sqlite3 database instead of the custom file storage.

https://github.com/AppThreat/vulnerability-db/tree/feature/store5

I have come up with the following design after numerous testing:

  • Two separate sqlite db - one for index-only search and one with full CVE information
  • JSON columns to store the dumped CVE 5 pydantic models
  • Purl vers column in index db

vdb6

index-vdb6

ls -lh                                                                                                                                                                                   
total 5.8G
-rw-r--r-- 1 prabhu prabhu 500M Mar 17 11:07 data.index.vdb6
-rw-r--r-- 1 prabhu prabhu  30M Mar 17 11:07 data.index.vdb6.tar.xz
-rw-r--r-- 1 prabhu prabhu 5.2G Mar 17 10:35 data.vdb6
-rw-r--r-- 1 prabhu prabhu  88M Mar 17 11:05 data.vdb6.tar.xz

Alternatives considered

  • Pickle dump BLOB - This reduced the db size slightly from 5.2 GB to 5.0 GB but added 10 minutes to the db creation time
  • lzma compress - This reduced the db row size tremendously but increased the db creation time to several hours. This is also not required since the entire file after .xz compression is significantly small

Have implemented two search apis - search by purl and search by cpe. The searches are quite fast, thanks to multiple indexes.

Since v6 includes a number of breaking changes, depscan v6 requires a certain level of rework.

DB creation time has increased by 30 mins to 1h:15m from 45 mins. Probably, there is some sqlite PRAGMA configuration available to reduce the time by risking some data loss.

https://github.com/AppThreat/vdb/actions/runs/8318736761

With a couple of pragma statements, the DB creation time has gone down to less than an hour. Locally, with a decent RAM, it is around 15 minutes for me.

https://github.com/AppThreat/vdb/actions/runs/8325052984

Some recent changes have increased the uncompressed db size to 13GB from 5.1 GB. Investigating.

ls -lh /mnt/work/vdb
-rw-r--r-- 1 prabhu prabhu 434M Mar 19 12:32 data.index.vdb6
-rw-r--r-- 1 prabhu prabhu  13G Mar 19 12:32 data.vdb6