ms705 / nom-sql

Rust SQL parser written using nom

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Queries that contain arbitrary bytes cannot be parsed

lovasoa opened this issue · comments

SQL files can contain any byte sequence. However, this library only exposes a way to parse a rust string (that is, a sequence of unicode codepoints). This makes it impossible to parse some SQL files (such as the wikipedia dumps I am currently working with), as thay contain byte sequences that are not valid utf-8.

The api should expose a function that takes an &[u8] instead of an &str.

For handling byte sequences that are not valid utf8 in literal strings, I see two possibilities:

  • Using the already existing Blob(Vec<u8>) (the information that the literal was a string and not a blob would be lost)
  • Using a fault-tolerant utf8 decoder like rust-encoding (invalid characters would be lost).

Fixed in #34, I believe?