Argument commands

Question

Argument commands

larsgw opened this issue 4 years ago · comments

The citationjs parser needs to allow for more different kinds of commands, mostly argument commands. Arguments seem to be treated the same always: it either takes in a braced block or the first character of text. Exceptions are math blocks: \url takes in the dollar sign verbatim while \emph does not.

Emiliano Heyns · Answer 1 · Mon Nov 16 2020 23:52:52 GMT+0800 (China Standard Time)

That's more a difference whether a command parses its argument in verbatim-mode; \url expects one parameter, and parses that in verbatim mode; \href expects two arguments, but parses the first verbatim, and the 2nd normal. \begin{verbatim} ...\end{verbatim} parses everything in that environment verbatim. \verb parses everything until the end of the block it's in verbatim.

There's simply no math in verbatim environments, because the $ is just a character there.

Lars Willighagen · Answer 2 · Tue Nov 17 2020 01:38:54 GMT+0800 (China Standard Time)

That's a bit annoying, I was planning to do something like the following:

// constants.js
export const argumentCommands = {
  href (url, text) { return text === url ? text : `${text} (${url})` }
}

// value.js (grammar)
const grammar = new Grammar({
  // ...

  Command () {
    const command = this.consumeToken('command').value

    if (command in constants.argumentCommands) {
      const func = constants.argumentCommands[command]
      const args = []
      let arity = func.length // fun thing

      while (arity-- > 0) {
        this.consumeToken('whitespace', /* optional: */ true)
        args.push(this.consumeRule('Argument'))
      }

      return func(...args)
    } // else...
  },

  // ...
})

Emiliano Heyns · Answer 3 · Sat Nov 21 2020 00:06:13 GMT+0800 (China Standard Time)

If you retain the full parsed input attached to the tokens while tokenizing, it's possible to decide during this phase how you want to handle the input. Basically, you process the tokens according to their semantic meaning for normal mode, and for verbatim mode, you take the parsed orig text attached to the tokens and string it together.

Don't forget that commands can have arguments in square brackets. I simply ignore them, but for that I do have to parse them.

Lars Willighagen · Answer 4 · Sun Nov 22 2020 09:44:05 GMT+0800 (China Standard Time)

I think I might just let the command functions be called as if they're rules in the grammar, i.e. they can decide themselves how to parse their arguments. Perhaps a bit similar to what you're doing, based on what I saw. It feels a bit weird to make it that customisable but I don't think it can lead to code injection or the like.

By the way, I am working on a prototype plugin for @citation-js/plugin-bibtex that extends unicode support with your unicode2latex tables. I don't really want to put an additional 400KB in the default browser bundle so I think an optional plugin to the plugin could work well. I am still working out how to add things like {\\'{}I} but that might be helped by the changes mentioned above.

Emiliano Heyns · Answer 5 · Sun Nov 22 2020 20:51:07 GMT+0800 (China Standard Time)

From my pov you're making astounding progress.