Queries that contain arbitrary bytes cannot be parsed
lovasoa opened this issue · comments
Ophir LOJKINE commented
SQL files can contain any byte sequence. However, this library only exposes a way to parse a rust string (that is, a sequence of unicode codepoints). This makes it impossible to parse some SQL files (such as the wikipedia dumps I am currently working with), as thay contain byte sequences that are not valid utf-8.
The api should expose a function that takes an &[u8]
instead of an &str
.
For handling byte sequences that are not valid utf8 in literal strings, I see two possibilities:
- Using the already existing
Blob(Vec<u8>)
(the information that the literal was a string and not a blob would be lost) - Using a fault-tolerant utf8 decoder like
rust-encoding
(invalid characters would be lost).
Malte Schwarzkopf commented
Fixed in #34, I believe?