barryf / micropublish

A Micropub client that you can use to create, update, delete and undelete content on your Micropub-enabled website.

Home Page:https://micropublish.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Emoji are being stripped (on updates)

jamietanna opened this issue · comments

I've just done an update via MIcropublish and it looks like the emoji are being stripped:

-  - Massively threw off the plans, ruining a few days down in London with the family, and seeing other fam which was a shame, but the right call - even though on Friday Morph was basically fine again 🙄
+  - Massively threw off the plans, ruining a few days down in London with the family, and seeing other fam which was a shame, but the right call - even though on Friday Morph was basically fine again ����

Micropub update requests are currently made with a Content-Type of application/json. Would making a change to Micropublish to add charset=utf-8 help?

I think that should do it - I've raised #78 as a patch 🤞

It looks like the emoji characters are being garbled when you edit posts. When your server receives an update request is it UTF-8? The change in this PR updated headers to include the UTF-8 charset so I'm not sure where the problem lies.

I can see, from debugging with my server, that it does receive the JSON update request correctly, and it does map that correctly to a UTF-8 String that would then be published to GitLab.

I've also tried hitting the GitLab API with my server, and that also doesn't strip it - I wonder if it was something odd like this PR not having been pushed to Micropublish.net yet?

Sorry this is still not working for you. The changes in this PR were deployed on 12 July so Micropublish.net should be running from the latest. I did a fresh deploy over the weekend to update dependencies so it should be on the latest.

Looking at your headers I can see you're serving pages from S3. When you write the file to S3, are you specifying UTF-8 in the Content-Type?

You've got a meta tag that includes the charset in the HTML, but the response headers are just text/html. Is it possible to add UTF-8? I don't think that should matter though.

Hmm, yeah just tried it, and it seems to be still a problem. Interestingly it does work when I run Micropublish + my Micropub server, but not when I go through the hosted versions 🤔

Writing to S3 is all done using hugo deploy, so I think if pushing locally via a git commit works, I'd probably count that and the <meta> tags out.

Very odd 🤔

So interestingly this is still an issue (on my end), so I'm looking through it again to see what's up.

I'm seeing https://www-api.jvt.me/micropub?q=source&url=https://www.jvt.me/mf2/2022/04/vhdfr/:

{
  "properties": {
    "content": [
      "Today is my first official day of funemployment! I had a great few months at the Data Standards Authority in the Central Digital and Data Office in <span class=\"h-card\"><a class=\"u-url\" href=\"https://twitter.com/cabinetofficeuk\">@cabinetofficeuk</a></span>, but I'm looking forward to my next role, which I'll announce shortly \uD83D\uDC40\r\n\r\nWas nice having a week off last week [with the new pup](/tags/cookie/), and I'm looking forward to a couple more weeks off to reset before the new role!"
    ],
    "post-status": [
      "published"
    ],
    "published": [
      "2022-04-04T09:28:36.339947413Z"
    ],
    "syndication": [
      "https://twitter.com/JamieTanna/status/1510913918503469056"
    ]
  },
  "type": [
    "h-entry"
  ]
}

When run through a JSON parser or when viewed in Micropublish, this is correct:

[
  "Today is my first official day of funemployment! I had a great few months at the Data Standards Authority in the Central Digital and Data Office in <span class=\"h-card\"><a class=\"u-url\" href=\"https://twitter.com/cabinetofficeuk\">@cabinetofficeuk</a></span>, but I'm looking forward to my next role, which I'll announce shortly 👀\r\n\r\nWas nice having a week off last week [with the new pup](/tags/cookie/), and I'm looking forward to a couple more weeks off to reset before the new role!"
]

When an update of a page is sent, I see the following, which again looks correct:

POST /micropub
Authorization: [Bearer]               
Content-Type: [application/json; charset=UTF-8]
Accept-Encoding: [gzip;q=1.0,deflate;q=0.6,identity;q=0.3]
Accept: [*/*]                                                               
User-Agent: [Ruby]
Connection: [close]
Host: [localhost:8080]                                                      
Content-Length: [127]
{"action":"update","url":"...","replace":{"content":["👀\r\nthis is an update"]}} 

Created a fresh post, that's in draft, https://www-api.jvt.me/micropub?q=source&url=https://www.jvt.me/mf2/2022/04/watpo/:

{"type":["h-entry"],"properties":{"syndication":["https://brid.gy/publish/twitter"],"published":["2022-04-13T10:59:56Z"],"post-status":["draft"],"content":["Draft post for emoji testing.. please ignore / let me know if you see it \uD83D\uDC40"]}}

Which when parsed as JSON:

{
  "properties": {
    "content": [
      "Draft post for emoji testing.. please ignore / let me know if you see it 👀"
    ],
    "post-status": [
      "draft"
    ],
    "published": [
      "2022-04-13T10:59:56Z"
    ],
    "syndication": [
      "https://brid.gy/publish/twitter"
    ]
  },
  "type": [
    "h-entry"
  ]
}

And committed as https://gitlab.com/jamietanna/jvt.me/-/commit/e042bf781c7fe706d00f452df1632c6a7d2f3b7c.

And when updated https://gitlab.com/jamietanna/jvt.me/-/commit/df51a3205509df9cf40eebcbf1e7fe44c40c84e0, with the stripped emoji 🤔

Of which the request looks like:

POST /micropub

Authorization: [Bearer]
Content-Type: [application/json; charset=UTF-8]
Accept-Encoding: [gzip;q=1.0,deflate;q=0.6,identity;q=0.3]
Accept: [*/*]
User-Agent: [Ruby]
Connection: [close]
Host: [localhost:8080]
Content-Length: [212]
{"action":"update","url":"...","replace":{"content":["Draft post for emoji testing.. please ignore / let me know if you see it 👀\r\n\r\nand when updated"]}}

Doing the same with curl shows the same issue:

curl -H 'Authorization: Bearer ...' https://www-api.jvt.me/micropub -d '{"action":"update","url":"https://www.jvt.me/mf2/2022/04/watpo/","replace":{"content":["Draft post for emoji testing.. please ignore / let me know if you see it 👀\r\n\r\nand when updated"]}}' -H 'Content-Type: application/json; charset=UTF-8' -v
*   Trying 68.183.254.199:443...
* Connected to www-api.jvt.me (68.183.254.199) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=www-api.jvt.me
*  start date: Apr  4 15:00:23 2022 GMT
*  expire date: Jul  3 15:00:22 2022 GMT
*  subjectAltName: host "www-api.jvt.me" matched cert's "www-api.jvt.me"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* h2h3 [:method: POST]
* h2h3 [:path: /micropub]
* h2h3 [:scheme: https]
* h2h3 [:authority: www-api.jvt.me]
* h2h3 [user-agent: curl/7.82.0]
* h2h3 [accept: */*]
* h2h3 [authorization: Bearer ...]
* h2h3 [content-type: application/json; charset=UTF-8]
* h2h3 [content-length: 193]
* Using Stream ID: 1 (easy handle 0x558db95c9520)
> POST /micropub HTTP/2
> Host: www-api.jvt.me
> user-agent: curl/7.82.0
> accept: */*
> authorization: Bearer ...
> content-type: application/json; charset=UTF-8
> content-length: 193
> 
* We are completely uploaded and fine
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200 
< date: Wed, 13 Apr 2022 11:12:46 GMT
< content-type: application/json
< x-content-type-options: nosniff
< x-xss-protection: 1; mode=block
< cache-control: no-cache, no-store, max-age=0, must-revalidate
< pragma: no-cache
< expires: 0
< strict-transport-security: max-age=15724800; includeSubDomains
< x-frame-options: DENY
< 
* Connection #0 to host www-api.jvt.me left intact
{"type":["h-entry"],"properties":{"syndication":["https://brid.gy/publish/twitter"],"published":["2022-04-13T10:59:56Z"],"post-status":["draft"],"content":["Draft post for emoji testing.. please ignore / let me know if you see it ����\r\n\r\nand when updated"]}}%