snarfed / granary

💬 The social web translator

Home Page:https://granary.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Granary is generating "quotation-of" from content in the tweet that is not a quotation

aaronpk opened this issue · comments

This tweet is an example: https://twitter.com/oktadev/status/1018179594367782913

I'm guessing it's picking up on the fact that the tweet ends in a URL. But there is no additional information about that URL available, so the information in the quotation-of property ends up being not useful.

I think it should limit to generating the quotation-of property only if there is an actual tweet.

Twitter doesn't include the quoted tweet URL in the parent tweet text anymore either https://twittercommunity.com/t/updating-how-urls-are-rendered-in-the-quote-tweet-payload/105473

Here's the HTML Granary is generating for the tweet above:

<article class="h-entry">
  <span class="p-uid">tag:twitter.com:1018179594367782913</span>
  
  <time class="dt-published" datetime="2018-07-14T17:05:06+00:00">2018-07-14T17:05:06+00:00</time>
  
  <span class="p-author h-card">
    <data class="p-uid" value="tag:twitter.com:oktadev"></data>
<data class="p-numeric-id" value="786323471877836800"></data>
    <a class="p-name u-url" href="http://developer.okta.com">OktaDev</a>
<a class="u-url" href="https://developer.okta.com/blog"></a>
<a class="u-url" href="https://devforum.okta.com"></a>
    <span class="p-nickname">oktadev</span>
    <img class="u-photo" src="https://pbs.twimg.com/profile_images/1006555705082384384/izi1LTo4.jpg" alt="" />
  </span>

  <a class="u-url" href="https://twitter.com/oktadev/status/1018179594367782913">https://twitter.com/oktadev/status/1018179594367782913</a>
  <div class="e-content p-name">
  
  Three Developer Tools I'm Thankful For <a href="https://developer.okta.com/blog/2017/11/22/three-developer-tools-im-thankful-for">developer.okta.com/blog/2017/11/2…</a>
  </div>





<article class="u-quotation-of h-cite">
  <span class="p-uid"></span>
  
  
  

  <a class="p-name u-url" href="https://developer.okta.com/blog/2017/11/22/three-developer-tools-im-thankful-for">developer.okta.com/blog/2017/11/2…</a>
  <div class="">
  
  
  </div>

</article>

makes sense! thanks for filing.

this is actually a bit opaque even on Twitter itself. you're right, only trailing tweet urls become quote tweets, but trailing web urls sometimes become similar quote-like cards if they have card markup...but that occasionally fails too, maybe due to timed out fetches etc.

regardless, this is all academic since granary doesn't fetch the url and generate its own card/preview. will fix.

It's pretty explicit in the API. Twitter doesn't treat the "cards" like they do quote tweets, so you should be able to just look for the quoted_status property and only generate the quotation-of then. Here's how XRay does it.

wow, this has been a rabbit hole. ok. here we go, mostly for my own record...

i'm on board with limiting quotation-of to explicit quote tweets. i'd ideally also like to do something better with trailing URLs that twitter renders as cards. for example, this tweet:

image

...has a trailing URL in its content, https://www.brainstuffshow.com/podcasts/what-is-the-museum-of-broken-relationships.htm. twitter promotes that URL to a card and hides in the rendered content...but afaict there's no way to tell that explicitly from the API object. notably, display_text_range includes the trailing URL.

{
  "id_str" : "1021255310013427712",
  "in_reply_to_status_id_str" : null,
  "is_quote_status" : false,
  "full_text" : "\"For as long as there's been love, there's been heartbreak and pain. But perhaps...it's in the liminal space between love and loss that we find our shared humanity, and discover our capacity for empathy.\"\n- Brain Stuff, \"The Museum of Broken Relationships\" https://t.co/VNm0LYmXJO",
  "display_text_range" : [0, 280],
  "entities" : {
    "urls" : [{
      "url" : "https://t.co/VNm0LYmXJO",
      "indices" : [257, 280],
      "display_url" : "brainstuffshow.com/podcasts/what-…",
      "expanded_url" : "https://www.brainstuffshow.com/podcasts/what-is-the-museum-of-broken-relationships.htm"
    }],
  },
  "..."
}

tweets with trailing URLs that don't become cards, like your example here, https://twitter.com/oktadev/status/1018179594367782913, look the exact same in the API. display_text_range similarly includes the trailing URL.

(i could have sworn the twitter API put trailing card URLs outside display_text_range at some point during the extended tweet (etc) rollout, but i haven't found any record of that, so i'm probably misremembering.)

anyway. i dropped the quotation-of property from non-quote-tweet URLs this morning. feel free to try.

however, @aaronpk i suspect you were hoping to lose the URL citation block at the bottom entirely, and not just that property?

not sure what you mean by the URL citation block at the bottom. The way that Monocle renders posts is it includes the quoted post in the UI if quotation-of is present. That ends up incorrectly looking like a QT in cases like this.

screenshot 2018-08-14 11 10 26

discussed in #indieweb-dev. tldr, "this is fine," probably. 😆

I just noticed this, testing out my notes feed. For this post, granary generates:

<p><span class="p-summary">Twitter officially welcomes bigotry now.</span></p>

<blockquote class="u-quotation-of h-cite">
<p class="p-content">“We welcome everyone to express themselves on our service. Sometimes these expressions may be offensive, controversial, and/or bigoted. We prohibit targeted behavior that harasses, threatens, or uses fear to silence others and take action when they violate our policies.”</p>

<p><a class="p-author h-card" href="https://safety.twitter.com">@TwitterSafety</a>'s <a class="u-url" href="https://twitter.com/TwitterSafety/status/1026979628475248640">tweet</a></p>
</blockquote>

<p>I think I’m done letting them have my content and my clicks. I have feeds on my homepage for articles and one on my <a href="/notes/">notes page</a> that you can subscribe to in a feed reader. I am thinking about setting up an email newsletter if people want to subscribe and keep up with my posts that way. More details to come.</p>

<blockquote>
<a class="p-name u-url" href="https://safety.twitter.com">@TwitterSafety</a>: “We welcome everyone to express themselves on our service. Sometimes these expressions may be offensive, controversial, and/or bigoted. We prohibit targeted behavior that harasses, threatens, or uses fear to silence others and take action when they violate our policies.”
</blockquote>

It's entirely possible I'm mis-using quotation-of since the post isn't primarily a quotation? Not a big issue for me, but it's another data point.

@gRegorLove thanks! i think that's a bit different. the entire quotation is inside your e-content, which is why it's duplicated: http://mf2.pin13.net/mf2/?url=https%3A%2F%2Fgregorlove.com%2F2018%2F08%2Ftwitter-officially-welcomes-bigotry-now%2F

detecting that kind of duplication is harder than it seems, so granary only currently tries for name, not other properties. feel free to file a(nother) feature request though!