jgm / commonmark-hs

Pure Haskell commonmark parsing library, designed to be flexible and extensible

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add a "Free" renderer implementation of IsBlock/IsInline

lucasdicioccio opened this issue · comments

I've recently embedded this library in my tool to generate my blog and I find it pretty cool as a user. So first-of all thanks.

Some context, the sort of things I want to do are vanity analytics about my content (much like the article<>topic graph on the homepage).

For such analytics, I realize I will have to run a number of statistics on the structure of the parsed blocks/inlines. Choices I have found so far are either:

  • one renderer per 'statistics' (e.g., collect the links so that I can draw a graph between articles, tell which website I link the most etc.)
  • one generic data-structure collector that I can then consume and pattern-match on

For the 2nd case, I think I'm about to implement the 'most mundane' implementation: a free implementation of IsBlock/IsInline/HasAttributes with a datatype that has one constructor per typeclass function, and wrapped in a newtyped list to get the Nonoid for free as well.

Do you foresee any issue with this approach? at first glance I think it's definitely approachable. Would you mind a PR adding it to this directory if needed (I've heard ZuriHac 2022 is approaching and it seems like a good project to hack on)?

Yes, I'm definitely open to it! I haven't really worked with free monads, so I'll have to see what it looks like in the end before deciding whether to include it here. At first glance, it seems a good idea.

One question is how this will handle extensibility. The way we do extensions, each extension can define additional typeclasses. Can you handle extensions that introduce new types of block or inline elements?

In my intended use-case, extensibility doesn't seem to be a blocker and working on the untouched (or only supporting the basic extensions) would be enough.

I think one way to have extensibility with records that you get with typeclasses is to make a 'hole' in the structure to allow extensibility upfront (e.g., data Block ext = ThematicBreak | CodeBlock Text Text | ... | Ext ext). Then someone can implement HasThing (Block Thing) with thing = Ext (Thing = Thing1 Text | Thing2 Int Text).

A bit handwavy at this point. I'll try something for my usecase over the coming weeks and let you know here.

In my intended use-case, extensibility doesn't seem to be a blocker

It would be a blocker, though, for inclusion in this library.

Note: for your purposes, it might also work to just use the commonmark-pandoc classes which give you a pandoc AST. (You don't need to depend on pandoc for this, just the relatively lightweight pandoc-types.) This seems very similar to what you're looking for, though less tightly linked to commonmark's own categorizations.