support MySQL special character escape sequences
lovasoa opened this issue · comments
Hi!
Thank you for this great library.
I am trying to use it to parse wikipedia dumps in my project wikipedia-externallinks-fast-extraction.
Unfortunately, they contain mysql escape characters that are currently not supported by this library.
Unsupported characters
The escape characters are:
\0
\'
\"
\b
\n
\r
\t
\Z
\\
\%
\_
Example
INSERT INTO externallinks VALUES (23481,120102,'http://home.arcor.de/jean-polmartin/aufsaetze/apliut.htm\'','http://de.arcor.home./jean-polmartin/aufsaetze/apliut.htm\'','http://de.arcor.home./jean-polmartin/aufsaetze/apliut.htm\'');
SQLite escape sequences don't seem to be supported either. According to the README:
We try to support both the SQLite and MySQL syntax; where they disagree, we choose MySQL. (It would be nice to support both via feature flags in the future.)
So I think :
''''
should parse as'
(sqlite-only)'\'
should parse as\
(sqlite-only)'\\'
should parse as\\
(it is a valid string in both SQLite and mysql, but with a different meaning)'\''
should parse as'
(mysql-only)
It should not be difficult to implement using nom::escaped_transform
Good catch -- I actually independently ran into this issue last week (also parsing MySQL dumps) and made a mental note to fix it!
I originally looked at nom::escaped_transform
for this, but didn't yet figure out exactly how to use it for this purpose. Looks like you ended up hand-rolling the parse rule instead, probably for good reasons.
I'll check out the PR 👍
I also tried to use nom::escaped_transform
but with no success. I think this is because of that part of the documentation:
WARNING: if you do not use the verbose-errors feature, this combinator will currently fail to build because of a type inference error
I think we have all of them supported now, thanks to @lovasoa's work 👍