metaformats: should we distinguish the parsed output item more explicitly?

Question

metaformats: should we distinguish the parsed output item more explicitly?

snarfed opened this issue 8 months ago · comments

Right now, if https://microformats.org/wiki/metaformats finds eligible metaformats, it generates an h-entry and appends it to the returned items. There's no way to distinguish this item from real mf2 items, though, which is unfortunate. As an implementor, I could use one! Especially for interpreting home page metaformats as an h-card, eg #3, but also for non-homepage pages. Should we include a new property? New type? (I assume not.) Something else? cc @tantek

Sven · Answer 1 · Thu Nov 30 2023 20:46:02 GMT+0800 (China Standard Time)

given that they as far as I see don't really participate in the nesting of objects (i.e. a metaformats-parsed object is not going to be a child or property-value of an mf2-parsed object, nor vice-versa) they could be sorted in a separate list, e.g. metaformats-items. Alternatively, they could have an extra flag on the same level as type

Anthony Ciccarello · Answer 2 · Sat Dec 02 2023 08:14:07 GMT+0800 (China Standard Time)

I wondered about this too in microformats/microformats-parser#229.

Should there be a property identifying the mf as being parsed from metaformats in case someone wants to cleanup messy meta tag content

I'd prefer to not put them in a separate list so a consumer of the parsed output doesn't need to do anything extra. So far I haven't personally needed to know if if an output if from metaformats, but I could see a property identifying it being useful.

Angelo Gladding · Answer 3 · Mon Dec 04 2023 09:30:09 GMT+0800 (China Standard Time)

I think adding a new property meta-item keeps things clean and explicit. In Python:

if parsed["meta-item"]:

vs. eg.

if parsed["items"] and parsed["items"][-1].get("source") == "metaformats":

I believe mf2py can toggle metaformats parsing on by default immediately if we can keep items as is and use meta-item experimentally -- see microformats/mf2py#213 (comment)

Ryan Barrett · Answer 4 · Mon Dec 04 2023 10:42:12 GMT+0800 (China Standard Time)

As @aciccarello mentioned, the problem is that a separate list forces all consumers to have to be explicit. One of the benefits of the current metaformats spec is that it lets current mf2 consuming code (choose to) benefit from metaformats automatically, without any changes. New top-level field preserves that, separate list doesn't.

Angelo Gladding · Answer 5 · Tue Dec 05 2023 08:03:29 GMT+0800 (China Standard Time)

I do like automatic fallback for entries. Now I better see what you guys are talking about.

mf2util will need to be updated to look for the new top-level field and ignore it when interpreting a feed but everything else in that library should just work (again by simply operating on the first item).

>>> mf2json = mf2py.parse(url="https://zeldman.com", metaformats=True)
>>> homepage_feed = mf2util.interpret_feed(mf2json, "https://zeldman.com")
>>> homepage_feed["entries"][-1]["name"]
'Zeldman on Web and Interaction Design'

The fix will look something like this which is perfectly fine:

if feed["entries"][-1].get("source") == "metaformats":
    feed["entries"].pop()

And you'll never actually need to look up the meta item so I was optimizing for a non-existent case with:

if parsed["meta-item"]:

So keeping it in items and adding a top-level field does make good sense.