Conversion failure due to collation and unique indexes
andrewguertin opened this issue · comments
Describe the bug
If an sqlite file with (default) BINARY collation has a column with a unique index and two values that differ only in case, the conversion will fail, because sqlite3mysql creates databases and tables with utf8mb4_general_ci case insensitive collation, and under that collation the values are not unique.
Expected behaviour
Conversion to work
Actual result
MySQL failed adding indices to table mytable: 1062 (23000): Duplicate entry '' for key 'mykey'
System Information
$ sqlite3mysql --version
| software | version |
|------------------------|--------------------------------------------------------------------------------|
| sqlite3-to-mysql | 1.4.5 |
| | |
| Operating System | Linux 5.14.2 |
| Python | CPython 3.9.6 |
| MySQL | mysql Ver 15.1 Distrib 10.5.10-MariaDB, for Linux (x86_64) using readline 8.1 |
| SQLite | 3.35.5 |
| | |
| click | 8.0.1 |
| mysql-connector-python | 8.0.26 |
| pytimeparse | 1.1.8 |
| simplejson | 3.17.3 |
| six | 1.16.0 |
| tabulate | 0.8.9 |
| tqdm | 4.62.0 |
Additional context
Modifying sqlite3mysql to use utf8mb4_bin collation worked fine.
Documentation links:
https://www.sqlite.org/datatype3.html#collation
https://mariadb.com/kb/en/supported-character-sets-and-collations/
That's somewhat of an edge case, but MySQL is by default case insensitive, so I'm not sure how wise or unwise it would be to make the database case-sensitive in terms of compatibility. It would certainly open up a can of worms 😄
I could add an option to select your MySQL collation, I guess. 🤷
What's your suggestion?
I added these 2 CLI options to provide a custom charset
and collation
--mysql-charset TEXT MySQL database and table character set
[default: utf8mb4]
--mysql-collation TEXT MySQL database and table collation