storesafe / cordova-sqlite-storage

A Cordova/PhoneGap plugin to open and use sqlite databases on Android, iOS and Windows with HTML5/Web SQL API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Non-standard encoding of Emojis and other 4-byte UTF-8 characters on Android pre-6.0 (default NDK implementation)

brodybits opened this issue · comments

It is possible to store and retrieve an emoji character value such as \u1F603 [SMILING FACE (MOUTH OPEN)] in case of the default Android implementation and Windows. But SELECT HEX(?) with an emoji text value returns different results from (WebKit) Web SQL, iOS, and an Android database opened with the androidDatabaseImplementation: 2 setting. This is also an issue when querying the HEX value of a stored column with an emoji text value. Note that the iOS version and an Android database opened with the androidDatabaseImplementation: 2 setting are consistent with the sqlite3 CLI tool, at least on my Mac OS system.

I suspect this indicates that the default Android implementation (using Android-sqlite-connector) and Windows versions store emoji characters differently. This would impact cases where the same sqlite(3) database is created on another system or shared between multiple platforms. In addition I wonder if there may be issues with other 4-octet UTF-8 characters?

This does not seem to be a real issue on Windows. As discussed in #652 the Windows version uses UTF-16le database encoding by default. Upon closer examination the HEX value result does look correct for the UTF-16le encoding. If I would first do PRAGMA encoding="UTF-8" then SELECT HEX returns the same result as on iOS, Android/iOS WebKit Web SQL, and an Android database opened with the androidDatabaseImplementation: 2 setting.

Now closing as a duplicate in favor of #739. DOES NOT LOOK LIKE A DUPLICATE

I think #739 is different, reopening.

Crash in certain cases is also possible on certain Android versions (title updated again)

Encoding issue is only reproduced on Android pre-6.0.

Crash on Android is only possible in case of emoji BLOB values, for example:

SELECT LOWER(X'41F09F9883') AS lowertext

Updated title yet again

Here is an example to explain the non-standard encoding of emojis:

SELECT HEX('\uD83D\uDE03') (same as SELECT HEX('😃')) results in the following:

  • EDA0BDEDB883 (non-standard encoding) on Android pre-6.0 (default NDK implementation)
  • F09F9883 (standard UTF-8 encoding) on Android 6.0(+), iOS, macOS, and (WebKit) Web SQL on Android/iOS/browser
  • 3DD803DE (UTF-16 encoding) on Windows (not considered here)

From tests added in #829 (with minor changes for the sake of clarity):

SELECT X'F09F9883' (from standard UTF-8 encoding) results in the following:

  • 😃 (\uD83D\uDE03) on Android 6.0(+), iOS, macOS, and (WebKit) Web SQL on Android/iOS/browser
  • crashes on Android pre-6.0

SELECT X'EDA0BDEDB883' (from non-standard encoding on Android pre-6.0) results in the following:

  • �� (\uFFFD\uFFFD) on (WebKit) Web SQL on Android/iOS/browser
  • 😃 (\uD83D\uDE03) on Android with default NDK implementation and Android with androidDatabaseProvider: 'system' setting on Android 4.x
  • result value missing on plugin on iOS/macOS

In general the plugin on Android with androidDatabaseProvider: 'system' returns the same result as (WebKit) Web SQL on Android/iOS/browser.

Similar results were observed from similar tests with 4-byte Gothic Bairkan (U+10331) character.

P.S. The same non-standard encoding on Android pre-6.0 is also observed on the evcore plugin version.