sogaiu / tree-sitter-clojure

Clojure(Script) grammar for tree-sitter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Does not handle metadata that is a tagged literal

sogaiu opened this issue · comments

As discovered here, it appears our grammar (421546c) doesn't handle a construct which seems to be used by ClojureDart folks.

I've only managed to find a bit under 150 .cljd files so far, but I found 8 files where a certain construct isn't handled (and the construct appears more than once in multiple files).

To repeat a bit here (new info follows the quote)...

Begin Quote

I found something our grammar doesn't seem to handle:

^#/(w/State ~fx-class #_m/SnackBar)
(...)

Assuming for the moment that that is valid, I presume the ^ indicates metadata. IIUC, #/(w/State ~fx-class #_m/SnackBar) is a tagged literal. So there is some metadata which is on (...) (the ... is meant to stand in for more code here).

I guess that's not a case I anticipated:

meta_lit: $ =>
seq(field('marker', "^"),
repeat($._gap),
field('value', choice($.read_cond_lit,
$.map_lit,
$.str_lit,
$.kwd_lit,
$.sym_lit))),

If we're lucky a fix might be a matter of attending to that last choice there somehow.

Adding something in the last choice there (and I guess may be in old_meta_lit) is one option.

This brings up the topic of whether that choice should be there restricting things...possibly it could be replaced with $._form.

End Quote

I've tried both of the approaches mentioned above and they seem to work (though it may be that one is a bit slower than the other -- not sure yet). I think they do ok on our clojars tests as well but I'll check that again after some sleep perhaps.

Below are sort of the raw error results that turned up:

Adding something in the last choice there (and I guess may be in old_meta_lit) is one option.

This brings up the topic of whether that choice should be there restricting things...possibly it could be replaced with $._form.

I'm not quite sure how to interpret this new type of metadata since I don't work with clojure dart, but it seems like opening up the metadata grammar rules to be just any form might not solve this situation. Would ^ followed by #/( ... ) be considered a single form? I think it would be parsed as two forms, where #/ is a tagged ctor literal then a list literal. I think a special rule will be needed for this.

Thanks for your comments.

Would ^ followed by #/( ... ) be considered a single form?

^#/(<some-stuff>) is metadata and it is on something -- which in the original example above I wrote as (...).

My impression is that tagged literals are often written with a space after the symbol portion, so another way to express things is:

^#/ (<some-stuff>) (...)

Here there is a space after the / (which is a symbol).

This way of articulating things is closer to the more commonly encountered (at least for me) uuid case:

#uuid "00000000-0000-0000-0000-000000000000"

Note the space after the uuid and preceding the first ".

FWIW, the first tweak I tried with the grammar was:

    meta_lit: $ =>
    seq(field('marker', "^"),
        repeat($._gap),
        field('value', choice($.read_cond_lit,
                              $.tagged_or_ctor_lit,
                              $.map_lit,
                              $.str_lit,
                              $.kwd_lit,
                              $.sym_lit))),

With that change, using ahlinc's alpha, I get:

$ echo "^#/(data) (thing)" | tree-sitter.ahlinc parse -
(source [0, 0] - [1, 0]
  (list_lit [0, 0] - [0, 17]
    meta: (meta_lit [0, 0] - [0, 9]
      value: (tagged_or_ctor_lit [0, 1] - [0, 9]
        tag: (sym_lit [0, 2] - [0, 3]
          name: (sym_name [0, 2] - [0, 3]))
        value: (list_lit [0, 3] - [0, 9]
          value: (sym_lit [0, 4] - [0, 8]
            name: (sym_name [0, 4] - [0, 8])))))
    value: (sym_lit [0, 11] - [0, 16]
      name: (sym_name [0, 11] - [0, 16]))))

So far that looks ok to me.

Now it should be possible to test things out by using the dev branch at this commit [1].

If you have an appropriate version of babashka, it should be possible to use babashka tasks for fetching and testing. One task will fetch ClojureDart code from some repositories and the other will run the parse subcommand on just the fetched .cljd files.

To see a list of tasks try bb tasks -- that should show the things listed here:

fetch-code {:task fetch-code/-main}
parse-tree {:task parse-tree/-main}

Invocations are bb fetch-code and bb parse-tree respectively.

The corresponding tasks are defined in files that live here.

On a side note, there is also bash completion stuff here. I went for putting a bb file under ~/.local/share/bash-completion/completions instead of the mentioned addition to .bashrc, but whatever works...there are definitions for other shells too.


[1] Note: I force-pushed after the initial posting so the commit mentioned before (0f4e064) is not valid any more.

The change is just allowing metadata to be defined with $.tagged_or_ctor_lit then, that seems reasonable to fix this with, no new constructs are needed thankfully.

That syntax is so strange I can't begin to understand why it is necessary in clojure dart. I will try to pull down and test later, maybe tomorrow? We will see and I will report back as soon as I have the chance.

commented

Regarding necessity, I found the following text:

Generics

Unlike Java, Dart generics are not erased — it means that on the JVM at runtime a List<String> is just a List but that in Dart at runtime it's still a List<String>. This creates two problems: expressing parametrized types and dealing with the mismatch between strong typing of collections items and Clojure's collections.

Parametrized types

#/(List String) is the ClojureDart pendant of Dart List<String> and is in fact a tagged literal producing ^{:type-params [String]} List. Thus parametrized types are symbols as usual.

via: https://github.com/Tensegritics/ClojureDart/blob/main/doc/differences.md#generics

So may be that's saying:

#/(List String)

is an abbreviation for:

^{:type-params [String]} List

I understand, thanks for explaining. Seems like CLR clojure does something different as well (CLR also has real runtime generics) as per our discussion on

|System.Collections.Generic.IList`1[System.Int32]|

If only we had a specification for this language. At least this fix seems a little more straight forward than that of the CLR types. FWIW I tried it yesterday and it worked fine with my simple tests, I just forgot to report back.

commented

As we are splitting off some of the dev-related bits into another repository and some of those bits were on the dev branch, I've made a separate pre-0.0.12 branch to preserve the bits that are remaining here.

The fix can now be seen at 7df2726.

Added a test at 7de8f06.

commented

As mentioned above, addressed in 7df2726.