wooorm / markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions

Home Page:https://docs.rs/markdown/1.0.0-alpha.17/markdown/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Serializing mdast to markdown

Enoumy opened this issue · comments

commented

(Whoops accidentally hit enter before drafting a content for this question, my apologies for the noise!)

Hi! I have a perhaps newbie question! I can use markdown::to_mdast to go from &str -> Node. Is it possible/is there a function to go back to a string - Node -> &str` - in a way that roundtrips?

I came across Node::to_string, and it does seem to convert nodes into a string but it also deletes the links/titles/and most other ast nodes, which if re-parsed again, results in a different ast. Unsure if this question is reasonable/within the context of this crate, but is there an alternate function elsewhere that is round-trippable to/from &str <-> Node? I am also happy to take a stab at implementing this "rountrippable" unparser function myself, but was wondering if a function like it already existed.

For further clarification, by "roundtripping", I would be writing a property based test, like markdown::to_mdast(to_string(node)) == node be true for all node's.

Thanks!

commented

No, this is not yet possible, as mdast-util-to-markdown has not been implemented in Rust yet.

You can work on this. Though, it is involved work that takes a while. The good part is that everything has already been implemented in JavaScript.

Finally, “complete” roundtripping (toString(fromString(x)) == x) is impossible with ASTs. ASTs are abstract. They loose information. That is intentional. So the results will never be exact, but the results will be equivalent.

Will this work? Passing on the 'serde_json' serialized format to mdast-util-to-markdown?

commented

perhaps

I wrote a likely crummy implementation of this for a personal project here, would something like this make sense as a PR or a new crate?

It passes a (much) weaker version of the proptest @Enoumy proposes, where string -> mdast -> string2 -> mdast -> string3 produces an equivalent string2 and string3 (assuming I understand how proptest works 😁 )

I don't think it covers all the possible nodes mdasts can include, and it applies some opinionated formatting. I also suspect this recursive approach is bad for performance. (I'm learning rust through this project, so I wouldn't be surprised to learn something about this code is very far from best practices)

Nice start and welcome to rust :)

  • this project is no_std, looks like you’re using a bunch of that?
  • some potential bugs are fine, but it should be good from the start I think, have you looked at mdast-util-to-markdown? it’s battle tested and supports everything. Being mostly compatible across JS and Rust is also important to me!

I'll leave this code in my own project then. I found this issue when I was already mostly done with this implementation, so I couldn't until it was too late. I'll take a look now, but I don't plan to write something new when I have something that works for me.

Edit: if nothing else I need to copy the unsafe character support...

@wooorm, do you know why wouldn't leveraging the ToString implementation for this be a good idea? Or is the intention to have a separate method for this?

“to string” is already a thing in the mdast world, getting just the text out.
Formatting markdown is complex. And not always needed.
Yes, separate methods. See the first comment. https://github.com/syntax-tree/mdast-util-to-markdown

I see. I will try to work on a PR then 🙂.