edn-format / edn

Extensible Data Notation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Alternative string quote syntax

mohkale opened this issue · comments

Evidently there's been a discussion about this in clojure (almost 6 years ago) but there wasn't an issue for this in the edn spec so here's one.

So far the only syntax for strings is with speech marks " as delimiters. This makes writing strings that contain " a chore because you have to escape them at every occurrence. It also makes it harder to tell where one argument begins and another ends because no white space is needed to separate arguments.

(
  ;; needs escaping
  {:cmd "echo \"foo ${bar}\""}

  ;; There's three values in the map below, but can you tell where the second
  ;; one finishes?
  {:cmd "foo bar \"baz\"""\"bag\" bam boom"}

  ;; What if the string ends up spanning multiple lines?
  {:cmd "conf_file=\"c:/tools/msys64/msys2_shell.cmd\"
         if ! [ -f \"$conf_file\" ]; then
           echo 'failed to set PATH inheritance, conf file doesn't exist' >&2
           exit 1
         fi
  
         sed -i -e 's/rem \\(set MSYS2_PATH_TYPE=inherit\\)/\\1/' \"$conf_file\""}
)

Suffice it to say escaping quotes hurts readability when quotes are used in abundance. This also affects JSON. I recently started using edn to configure my dotfiles so I've been writing quite a bit of shell script in edn and this is the sort of stuff I keep having to deal with.

The discussion I've linked to above described 3 alternatives, hopefully this issue opens a dialogue and gets the ball rolling on how best to tackles this problem.

Personally I quite like the triple quote python approach.

Eg. of the Chicken Scheme syntax:

(define msg #<<END
 "Hello, world!", she said.
END
)

Personally I've never liked how the closer for heredocs have to span the whole line, You end up having to push any closing brackets or other constructs to the line after and it always looks unnatural to me 😞.

I mean, you could theoretically specify that the closer does not have to span the whole line, of course. "The closer is everything up to the first closing parenthesis" or something.

@zilti You're correct. Strangely enough I've never encountered a language that allowed that. Both bash and ruby don't seem to allow it. However php does. I see no issue with heredocs if we're going down that route.

True. And I guess in a way, XML's CDATA tag takes this even one step further, at the cost of not letting you customize the closer, opening with <[CDATA[ and closing with ]]>, no matter where on a line it is, and how much other stuff there is on that line.

After spending considerable time comparing string literal syntaxes across languages, here are my assessments:

  1. Rust's string literals strike a nice balance between (a) ease-of-mechanical-parsing and (b) human-readability.

  2. YAML's many kinds of string literals result in a format that is (a) difficult for machine parsing and (b) painfully complex for humans to use much less remember. As a result, the YAML spec is unnecessarily complicated, resulting in implementations of varying quality. Please, learn from YAML's choices here -- they are a cautionary tale.

Note: I could have phrased my comment more neutrally, but that would be hiding my bias. (My bias may or not be useful to you, so interpret it accordingly.) My assessment is informed by wrestling with tradeoffs in this space while designing a new human-readable interchange format.

I would like to chime in with an idea I have recently had.

A multiline string might begin with a backslash character, followed by blank space and then a line limiter. Subsequent lines would have some blank space at the beginning, then a backslash character. The rest of the line including the new line would be part of the multi-line string. The first line that doesn't start with a pipe character is not included and parsing continues as normal. The first and the last new line characters in the multi-line string are always removed.

Examples:

{
  :foo \
          \Bar
          \Baz
          \
   :Quxx "hi
}

Becomes

{
  :foo "Bar\nBaz\n"
  :Quxx "hi"
}

If alternative interpretations of the multiline string are needed, they can simply be dispatches. For example, #prose before a multi-line string might result in a multi-line strain with all of the new lines truncated into spaces, like YAML's >.

This makes several different types of multiline strings possible with a simple syntax that is easy to understand and remember.

It also allows you to embed one document inside another without escaping anything inside that document. All you have to do is prefix all the lines in the document with some space and a pipe. The ability to embed documents inside other documents is a killer feature for configuration languages. Being able to add it simply to edn like this would be amazing.