ixmatus / orgmode-parse

Attoparsec parser combinators for parsing org-mode structured text!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How can I help with filling out more of orgmode-parse?

smurphy8 opened this issue · comments

I would love to help fill out more of your markup renderer in orgmode-parse.
Do you have an idea of what you want the markup types to look like?
I see in the readme it says this is being worked on. Any pointer on where the work is?

Thanks

@smurphy8 to be honest, Real Life got in the way and I never even started it. I started thinking about it but did not initiate an implementation.

I was going to peek at pandoc's types to see how that project represents markup because that would help with parity I think (they've also been thinking about document markup much longer than I have!)

If you want to contribute that, I would be interested in seeing some research done into how pandoc handles the AST for markup with a little friendly back and forth on general implementation as it may pertain to this project, and then I'd say go for it. How does that sound to you?

I think that sounds great...
I have scrapped pandoc's universal document format a few times, the main parts are:

  • The Meta Type
    Which is a MetaValue wrapped in a newtype
data MetaValue = MetaMap (M.Map String MetaValue)
               | MetaList [MetaValue]
               | MetaBool Bool
               | MetaString String
               | MetaInlines [Inline]
               | MetaBlocks [Block]
               deriving (Eq, Ord, Show, Read, Typeable, Data, Generic)

That allows the pandoc parser to read different styles of markup differently.
Notably for org-mode a Block and Inline feature

Here is a sample conversion from a simple orgmode heading with body, into pandoc...

* Some Value
Here is some *body* =text=

becomes

[Header 1 ("",[],[]) [Str "Some",Space,Str "Value"]
,Para [Str "Here",Space,Str "is",Space,Str "some",Space,Strong [Str "body"],Space,Code ("",[],[]) "text"]]

I could start by making something like this inline type?

data Inline
    = Str String            -- ^ Text (string)
    | Emph [Inline]         -- ^ Emphasized text (list of inlines)
    | Strong [Inline]       -- ^ Strongly emphasized text (list of inlines)
    | Strikeout [Inline]    -- ^ Strikeout text (list of inlines)
    | Superscript [Inline]  -- ^ Superscripted text (list of inlines)
    | Subscript [Inline]    -- ^ Subscripted text (list of inlines)
    | SmallCaps [Inline]    -- ^ Small caps text (list of inlines)
    | Quoted QuoteType [Inline] -- ^ Quoted text (list of inlines)
    | Cite [Citation]  [Inline] -- ^ Citation (list of inlines)
    | Code Attr String      -- ^ Inline code (literal)
    | Space                 -- ^ Inter-word space
    | SoftBreak             -- ^ Soft line break
    | LineBreak             -- ^ Hard line break
    | Math MathType String  -- ^ TeX math (literal)
    | RawInline Format String -- ^ Raw inline
    | Link Attr [Inline] Target  -- ^ Hyperlink: alt text (list of inlines), target
    | Image Attr [Inline] Target -- ^ Image:  alt text (list of inlines), target
    | Note [Block]          -- ^ Footnote or endnote
    | Span Attr [Inline]    -- ^ Generic inline container with attributes

probably not with all those types at first!

This looks pretty good to me in general, I would make sure that Str uses Text instead of String.

I was also thinking about using Free to represent the AST instead of a recursive sum-type, what are your thoughts on that?

Yeah sure I can write it like that, i'll work out the algebra a bit and get back with you.

I think I am going to write it straight ahead. It got really complicated really fast.

That sounds good to me, simple is better IMHO.

On Thu, Mar 3, 2016 at 11:19 AM, smurphy8 notifications@github.com wrote:

I think I am going to write it straight ahead. It got really complicated
really fast.


Reply to this email directly or view it on GitHub
#17 (comment)
.

Parnell Springmeyer
parnell@digitalmentat.com | digitalmentat.com | 0xDCCF89258EAD874A
http://pgp.mit.edu/pks/lookup?op=get&search=0xDCCF89258EAD874A

Yeah, the Heading falls out nicely, but I think we would have to unify and rework the parser in ways that would perhaps make parsing easier but at the expense of the usability.

Because there are multiple nesting terms, you have to either run a whole bunch of different Free Monads or extend some super object.

These guys:
article.gmane.org/gmane.emacs.orgmode/67871Obj

have such an object defined, but it loses exactly what I love about the orgmode representation you have, that is... It nests and feels like the text document.

Anyway, I will work on it more this weekend hopefully.

Closing this issue because it isn't tracking anything specific and is quite old at this point.