snarfed / granary

💬 The social web translator

Home Page:https://granary.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HTML to Atom: title element has encoded HTML

gRegorLove opened this issue · comments

Minor, but I noticed with this feed that one of my notes has encoded HTML getting into the <title>:

<title>I like this CSS image reset after watching Kevin Powell’s walkthrough. &lt;...</title>

The original post uses e-content and I'm guessing Granary is using the parsed html attribute for the title. A possible solution might be to use value attribute instead, for posts that don't have a name of course:

    "content": [
        {
            "html": "<p>I like this CSS image reset after watching <a href=\"https://www.youtube.com/watch?v=345V2MU3E_w\">Kevin Powell&#x2019;s walkthrough</a>.</p>\n\n<p>Also intrigued by the post he linked, &#x201C;<a class=\"h-cite\" href=\"https://csswizardry.com/2023/09/the-ultimate-lqip-lcp-technique/\">The Ultimate Low-Quality Image Placeholder Technique</a>.&#x201D;</p>",
            "value": "I like this CSS image reset after watching Kevin Powell\u2019s walkthrough.\nAlso intrigued by the post he linked, \u201cThe Ultimate Low-Quality Image Placeholder Technique.\u201d",
            "lang": "en"
        }
    ],

Oh I just realized it's not the first youtube <a href> getting encoded, but some later HTML element. I'm not sure then. Let me know if something is off with my HTML that's causing it then.

Huh! Thanks for the nudge. This is an odd one! It doesn't reproduce locally for me at all, with either the /stream/ feed or the specific post; both have <title>I like this CSS image reset after watching Kevin Powell’s walkthrough. </title>, no &lt;.... I do see it on prod https://granary.io/ with both though. Hrm.

Looked at this again, I'm now able to reproduce it locally. Output is still a bit different, I'm guessing that's because I'm using a different HTML parser locally vs in prod.

I suspect we're generating title from HTML content, then ellipsizing, and we end up with just the opening < of a tag, which we then entity-encode.

Fixed!