abhabongse / paxter

Document-first text pre-processing mini-language loosely inspired by @-expressions in Racket

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make Paxter option list more flexible by allowing recursive @-expressions

abhabongse opened this issue · comments

New syntax proposal:

start ::= non_greedy_fragments                      /* FragmentList */
non_greedy_fragments ::= fragment*?
fragment ::= command | NON_GREEDY_TEXT
command ::=
    | "@" IDENTIFIER options wrapped_main_arg       /* PaxterApply */
    | "@" IDENTIFIER wrapped_main_arg               /* PaxterApply */
    | "@" IDENTIFIER options                        /* PaxterApply */
    | "@" IDENTIFIER                                /* PaxterPhrase (special case) */
    | "@" wrapped_phrase                            /* PaxterPhrase */
    | "@" wrapped_main_arg                          /* FragmentList or Text */
    | "@" wrapped_quoted_text                       /* Text */
wrapped_main_arg ::=
    | wrapped_fragments                             /* FragmentList */
    | "!" wrapped_banged_text                       /* Text */
wrapped_fragments ::=
    | "#" wrapped_fragments "#"
    | "<" wrapped_fragments ">"
    | "{" non_greedy_fragments "}"
wrapped_banged_text ::=
    | "#" wrapped_banged_text "#"
    | "<" wrapped_banged_text ">"
    | "{" NON_GREEDY_TEXT "}"
wrapped_quoted_text ::=
    | "#" wrapped_quoted_text "#"
    | "<" wrapped_quoted_text ">"
    | "\"" NON_GREEDY_TEXT "\""
wrapped_phrase ::=
    | "#" wrapped_phrase "#"
    | "<" wrapped_phrase ">"
    | "(" NON_GREEDY_TEXT ")"
options ::= "[" [ arg ( "," arg )* [ "," ] ] "]"    /* OptionList */
arg ::= [ IDENTIFIER "=" ] val
val ::=
    | command
    | wrapped_quoted_text                           /* Text */
    | JSON_NUMBER                                   /* Number */
    | IDENTIFIER                                    /* Identifier */

NON_GREEDY_TEXT ::= /.*?/
IDENTIFIER ::= ID_START ID_CONT*
  • ID_START represents a subset of characters (for regular expression) that is allowed to be the first character of an identifier, consisting of an underscore (_) plus Unicode character classes Lu, Ll, Lt, Lm, Lo, and Nl.
  • ID_CONT represents a subset of characters (for regular expression) that is allowed to be the subsequent characters of an identifier, consisting of all characters from ID_START plus Unicode character classes Mn, Mc, Nd, and Pc.
  • Parsing JSON_NUMBER tokens will strictly follow the JSON specification and the value in the parsed tree will be recognized by json.loads function.

Addition:

Allow non-id symbols to follow @, and wrap the only character immediately following the @ as a part of PaxterPhrase. This allows the usage of @, or @; analogous to LaTeX's \, and \;, etc.

Another Suggestion:

For wrapped_phrase, instead of using a pair of normal parentheses ( and ) perhaps we could just use a pair of bars | and | instead. Hence, for example:

  • 1@hello.2 would be equivalent to 1@|hello|.2
  • Instead of writing @(1 + 1), we should instead write @|1 + 1|.
  • If the previous comment is to be adopted, then a@,b would be equivalent to saying a@|,|b
  • Of course, in order to put a single bar within the PaxterPhrase, the user would be required to add delimiters around it, such as @#|||#.
  • Similarly, to put in other symbols like {, ", #, or < right after @ command, they will need to be wrapped with extra delimiters, such as @#|...|#.
  • Finally, the meaning of @@ would be the same as typing @|@| which is a PaxterPhrase containing the string "@".

Questions:

  • Do I need [...options...] part of command? One argument for it is that I would like a mechanism to be able to provide multiple pieces of text into a single command (think string replace).
  • Do I wish to provide assistance for macros? This might not be explicitly necessary as the input will already be transformed into parsed tree which means that macro expansion comes naturally.