unee-t / frontend

Meteor front end

Home Page:https://case.dev.unee-t.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

﷐[U+1F64C]﷑ unicode mangling

kaihendry opened this issue · comments

Unicode set via the Frontend seems to get mangled like so �[U+1F64C]�

How I reproduced: https://s.natalian.org/2019-07-09/ios.mp4 -> https://s.natalian.org/2019-07-09/emoji-test.mp4
Postman
Adminer

I'm not quite sure if this a Bugzilla or Frontend issue. I need better visibility on the Post request that sets the comment.

For example I don't know what api_key is from

[2019-07-09T10:45:23+08:00] (ecs/meteor/c7022182-7882-4807-b669-e1ca1659dff2) { "file": "bugzilla-api.js", "line": "42", "message": "{ method: 'post',\n endpoint: '/rest/bug/74447/comment',\n statusCode: 201,\n duration: 348 }", "method": "callAPI", "timestamp": "2019-07-09T02:45:23+0000", "title": "request" }

So I am at a loss how to reproduce the POST

I don't know what api_key is from

This is the BZ API key for the user who is posting the comment

After chatting with dylan it would appear this is a Bugzilla issue after all.

11:58 <dylanwh> hendry: That would be the encoding that was added by 5.0, it'll need to be handled by updating the comments
11:59 <dylanwh> I'll fix it in the branch probably tomorrow evening.
11:59 <hendry> what encoding is that out of interest?
12:00 <dylanwh> it's in Bugzilla/Comment.pm. The previous bugzilla devs tried working around the utf8mb4 problem by encoding things outside the basic multilingual plane using PUA (private use area) unicode characters.
12:01 <dylanwh> https://github.com/bugzilla/bugzilla/blob/synthesis/Bugzilla/Comment.pm#L443
12:01 <dylanwh> ^ I have to:
12:01 <dylanwh> 1) remove that code
12:01 <dylanwh> 2) make checksetup fix all existing comments.

Any update on that @kaihendry ?

@dylanwh didn't push any new updates last weekend; bugzilla/bugzilla#79

the synthesis branch is deployed to dev and demo environments. Though there are issues in demo: https://media.dev.unee-t.com/2019-07-17/visibility.txt I don't understand.

dev appears fine for me. Once the work is done, the Unicode test needs to be for new entries btw, not old. Since re-encoding old comments might be too tricky. IIUC certain ranges are re-encoded by the legacy code which needs to be removed.

There is a UIlicious test for this, under case->bugs->emoji. It's currently failing and I need to debug it. Tbh I think it's a problem with the test. https://media.dev.unee-t.com/2019-07-22/uilicious-2227858584033864591.mp4

Here's a simple test case to illustrate the ongoing issue we have:

[hendry@t480s unee-t]$ ./debug.sh
�[U+1F923]�
[hendry@t480s unee-t]$ cat debug.sh
#!/bin/bash
APIKEY=gNI0iPumxCrkCJ64
HOST=https://dashboard.dev.unee-t.com
id=$(curl -s -X POST -H 'Content-type: application/json' -d "{ \"api_key\": \"${APIKEY}\", \"comment\" : \"🤣\" }" $HOST/rest/bug/74630/comment | jq -r .id)
curl -s $HOST/rest/bug/comment/$id?api_key=$APIKEY | jq -r ".comments[\"$id\"].text"

ok, so this needs to the script to fix brace-encoded bits. I'll do that in the synthesis branch in the next 20 hours or so

Dev is running bugzilla/bugzilla@e88ec7f, note how https://dashboard.dev.unee-t.com/ version in the top right correlates to the Bugzilla synthesis branch version!

[hendry@t480s unee-t]$ bash debug.sh
🤣

Looking good after all the conversion!

Converting components to row format Compressed.
WARNING: We are about to convert your table storage format to UTF-8. This
         allows Bugzilla to correctly store and sort international characters.
         However, if you have any non-UTF-8 data in your database,
         it ***WILL BE DELETED*** by this process. So, before
         you continue with checksetup.pl, if you have any non-UTF-8
         data (or even if you're not sure) you should press Ctrl-C now
         to interrupt checksetup.pl, and run contrib/recode.pl to make all
         the data in your database into UTF-8. You should also back up your
         database before continuing. This will affect every single table
         in the database, even non-Bugzilla tables.
         If you ever used a version of Bugzilla before 2.22, we STRONGLY
         recommend that you stop checksetup.pl NOW and run contrib/recode.pl.
Converting table storage format to utf8mb4 (collate utf8mb4_unicode_520_ci). This may take a while.

This required some re-jigging of the bugzilla target health checks to give it enough time to do the conversions: