SQLite binary database format parser.
Homonym libraries:
- sqlite_parser is a front-end to rusqlite and doesn't actually do parsing.
- sqlite3-parser is parser + lexer for SQLite3-compatible SQL grammar.
In your Cargo.toml:
[dependencies]
sqlite-parser-nom = "1.0.0"
Load and parse file in memory:
use sqlite_parser_nom::Reader;
use sqlite_parser_nom::error;
fn main() -> Result<(), error::SQLiteError> {
let reader = Reader::open_mmap("sample/sakila.db")?;
println!("{}", reader.header.db_size);
Ok(())
}
You can also use parsers directly
use nom::Finish;
use sqlite_parser_nom::parser;
use sqlite_parser_nom::model;
use sqlite_parser_nom::error;
fn do_something_with_page(i: &[u8]) -> Result<model::Page, error::SQLiteError> {
let (_, page) = parser::page(i)
.finish()
// the cast is necessary here, so the error could outlive the input
.map_err(|e| nom::error::Error {
code: e.code,
input: error::OwnedBytes(e.input.to_owned()),
})?;
Ok(page)
}
Check the documentation and parser to chose correct parser for your task.
References:
- Database File Format - official file format guide
- Requirements for the SQLite Database File Format - detailed list of assumptions
- The Definitive Guide to SQLite - Chapter 11, high level overview
- Mobile Forensics – The File Format Handbook - detailed description until the cell contents
+---+-------+-----------+-----------+-----------+
| h | | | | |
| e | | | | |
| a | root | page 2 | ... | page N |
| d | page | | | |
| e | | | | |
| r | | | | |
+---+-------+-----------+-----------+-----------+
^ ^ ^
< page size | page size | page size | page size >
- The SQLite database file is divided into equally-sized pages
- Page size is defined within the header
- Root page includes file header, but together still fits page size
- All pages, including root page, count internal offsets from the beginning of the page itself
- Pages are referenced by the number, therefore their position in the binary file can be computed
+---------------------+
| page header |
+---------------------+
| cell pointer array |
+---------------------+
| |
| unallocated space |
| |
+-------+-------------+
|cell N |free block |
+-------+------+------+
|cell 5 |cell 4|cell 3|
+-------+----+-+------+
| cell 2 | cell 1 |
+------------+--------+
Page types:
- Both Index and Table pages in their Interior and Leaf flavour have the same structure, but differ in the header and
some extra fields
- See models for exact definition and BTree section for logic
- Overflow page just has
0x00
in header and the rest is payload - Locking page is empty page in databases > 1gb at 1,073,741,824 offset
- Pointer page exists in autovacuumed DBs and contains pointers to reorganized pages
- Free blocks are stored in free-list and are not nulled, they might contain data, which was supposed to be removed
Page structure:
- Cell pointer array grows from the top of the page to the bottom
- Pointers are byte offsets within the page
- Cells grow from the bottom of the page to the top
+-----------+--------+--------+--------+-----+--------+-----------+-----+-----------+
|Payload | | Header | Serial | | Serial | Data Cell | | Data Cell |
|(Cell Size)| ... | Size | Type 1 | ... | Type N | Column 1 | ... | Column N |
+-----------+--------+--------+--------+-----+--------+-----------+-----+-----------+
| |
< cell header ^ record header ^ table row data >
< cell size >
- This structure with some amendments applies to Table Leaf, Index Leaf and Interior pages
- Table Interior page contains pointers to other pages and corresponding rowid
- Header and auxillary values within the cell are mostly as varint
- Serial types correspond to the data payloads and contain size information within them