HTML to Atom: title element has encoded HTML
gRegorLove opened this issue · comments
Minor, but I noticed with this feed that one of my notes has encoded HTML getting into the <title>
:
<title>I like this CSS image reset after watching Kevin Powell’s walkthrough. <...</title>
The original post uses e-content
and I'm guessing Granary is using the parsed html
attribute for the title. A possible solution might be to use value
attribute instead, for posts that don't have a name
of course:
"content": [
{
"html": "<p>I like this CSS image reset after watching <a href=\"https://www.youtube.com/watch?v=345V2MU3E_w\">Kevin Powell’s walkthrough</a>.</p>\n\n<p>Also intrigued by the post he linked, “<a class=\"h-cite\" href=\"https://csswizardry.com/2023/09/the-ultimate-lqip-lcp-technique/\">The Ultimate Low-Quality Image Placeholder Technique</a>.”</p>",
"value": "I like this CSS image reset after watching Kevin Powell\u2019s walkthrough.\nAlso intrigued by the post he linked, \u201cThe Ultimate Low-Quality Image Placeholder Technique.\u201d",
"lang": "en"
}
],
Oh I just realized it's not the first youtube <a href>
getting encoded, but some later HTML element. I'm not sure then. Let me know if something is off with my HTML that's causing it then.
Huh! Thanks for the nudge. This is an odd one! It doesn't reproduce locally for me at all, with either the /stream/
feed or the specific post; both have <title>I like this CSS image reset after watching Kevin Powell’s walkthrough. </title>
, no <...
. I do see it on prod https://granary.io/ with both though. Hrm.
Looked at this again, I'm now able to reproduce it locally. Output is still a bit different, I'm guessing that's because I'm using a different HTML parser locally vs in prod.
I suspect we're generating title from HTML content, then ellipsizing, and we end up with just the opening <
of a tag, which we then entity-encode.
Fixed!