seancorfield / next-jdbc

A modern low-level Clojure wrapper for JDBC-based access to databases.

Home Page:https://cljdoc.org/d/com.github.seancorfield/next.jdbc/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Strings are corrupted when sending non-ascii characters via babashkas mysql pod using next.jdbc

orolle opened this issue · comments

Describe the bug
Strings are corrupted when sending non-ascii characters via babashka mysql-pod via next.jdbc & mysql-jdbc 8.0.25 to mysql 8.0.32 database. The database supports utf8 and does store strings correctly when using another db client. Somehow strings are corrupted when sending strings to the database. The other direction works, when querying non-ascii data in the db, the result set is correct and strings are not corrupted.

Not sure if its an issue for next.jdbc or babashka or graal or combination. Whats your opinion?

To Reproduce
Using babashka with mysql-pod v0.1.2
`
(require '[babashka.pods :as pods])
(pods/load-pod 'org.babashka/mysql "0.1.2")
(require '[pod.babashka.mysql :as mysql])

(def db {:dbtype "mysql"
:host "HOST"
:port 3306
:dbname "SCHEMA"
:user "USER"
:password "PASSWORD"
:zeroDateTimeBehavior "convertToNull"
:characterEncoding "utf8"
:charSet "utf8mb4"
:useUnicode "true"})

(mysql/execute!
db
["SELECT 'ä'"])

Result: [{:ä "ä"}]
Expected: [{:ä "ä"}]

(mysql/execute!
db
["INSERT INTO t (A,B) VALUES (?,?)" "ä", "ö"])
Result is corrupted strings inside the DB
`

Environment (please complete the following information):

  • OS: windows
  • babashka

I suspect this is because the default charset for the JVM on Windows in not UTF-8. See, for example: https://stackoverflow.com/questions/1006276/what-is-the-default-encoding-of-the-jvm

Seems not it. On my machine everything is UTF-8

(java.nio.charset.Charset/defaultCharset)
#object[sun.nio.cs.UTF_8 0xc5a1a3a "UTF-8"]

(System/getProperty "file.encoding")
"UTF-8"

(.getEncoding (java.io.InputStreamReader. System/in))
"UTF8"

A workaround is to transform UTF-8 strings to a hex-string and then convert it back inside mysql.

(defn unicode-to-hex [unicode]
  (->> (.getBytes unicode "UTF-8")
       (map #(bit-and % 0xFF))
       (map #(Integer/toHexString %))
       (str/join "")))

(defn sql-hex-to-unicode [hex]
  (str "CONVERT(x'" hex "' USING utf8mb4)"))

(defn sql-value [value]
  (cond 
    (nil? value)
    "NULL"
    (= "" value)
    "''"
    (string? value)
    (sql-hex-to-unicode (unicode-to-hex value))
    :else
    value))

Well, next.jdbc does zero transformation on data -- it treats everything as objects and passes it through the JDBC driver in both directions. At work we use MySQL/Percona 5.7 and the 8.0.22 driver and emojis work fine in utf8mb4 columns so I think this is something environmental on your side.

I will note that we have :characterEncoding "UTF-8" (not utf8) and we do not specify :charSet or :useUnicode so maybe that is something to investigate for you?

@orolle Did you get any further with this? I feel inclined to close this as "out of scope for next.jdbc" at this point...

I could not get it to work with special characters in string. I am using mysql CONVERT() now. to get things done.

OK, you have a solution, and this issue is "out of scope" so I'm closing it.