josefs / Gradualizer

A Gradual type system for Erlang

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"syntax error" reported in gradualizer_highlight:parse_into_list_of_forms/1

jesperes opened this issue · comments

I'm getting a weird "syntax error" in one of our beams:

_build/test/lib/opsgenie/ebin/opsgenie.beam: escript: exception throw: {error,"4:28: syntax error before: '('"}
  in function  merl:fail/2 (merl.erl, line 1080)
  in call from merl:quote_1/3 (merl.erl, line 491)
  in call from gradualizer_highlight:parse_into_list_of_forms/1 (src/gradualizer_highlight.erl, line 141)
  in call from gradualizer_highlight:recreate_source/2 (src/gradualizer_highlight.erl, line 54)
  in call from gradualizer_highlight:prettyprint_and_highlight/3 (src/gradualizer_highlight.erl, line 36)
  in call from gradualizer_fmt:highlight_in_context/2 (src/gradualizer_fmt.erl, line 439)
  in call from gradualizer_fmt:try_highlight_in_context/2 (src/gradualizer_fmt.erl, line 409)
  in call from gradualizer_fmt:format_expr_type_error/4 (src/gradualizer_fmt.erl, line 387)

When I redbug into the gradualizer_highlight call, I get:

% 10:09:46 <0.110.0>(dead)
% gradualizer_highlight:parse_into_list_of_forms("encode_uri(Value) when is_list(Value) ->\n    encode_uri(list_to_binary(Value));\nencode_uri(Value) when is_binary(Value) ->\n    << uri_encode_path_byte(Byte)  || <<Byte>> <= Value >>.\n")

The string it seems to be complaining about is:

encode_uri(Value) when is_list(Value) ->
    encode_uri(list_to_binary(Value));
encode_uri(Value) when is_binary(Value) ->
    << uri_encode_path_byte(Byte)  || <<Byte>> <= Value >>.

The left side of the bitstring comprehension needs to be parenthesised:

<< (uri_encode_path_byte(Byte))  || <<Byte>> <= Value >>

The actual source code is parenthesized correctly:

%% @doc Encode URI into a percent-encoding string.
-spec encode_uri(list() | binary()) -> binary().
encode_uri(Value) when is_list(Value) ->
  encode_uri(list_to_binary(Value));
encode_uri(Value) when is_binary(Value) ->
  << (uri_encode_path_byte(Byte)) || <<Byte>> <= Value >>.

That's weird indeed. The abstract code embedded into the beam was transformed by the compiler? Perhaps you found an OTP bug?

I can report a bug, but I'm a bit unsure how to create a reproducible example without involving Gradualizer.

I tried this on OTP 22, 24, and 25, but with all of them I get the same, parenthesised printout:

> Form = merl:quote({1,1}, "<< (uri_encode_path_byte(Byte))  || <<Byte>> <= Value >>").
> io:format("~s\n", [erl_pp:expr(Form, [])]).
<<
  (uri_encode_path_byte(Byte)) ||
      <<Byte>> <= Value
>>
ok

I expected erl_pp to skip the parentheses, but it doesn't seem to be the case 🤔

The culprit clearly is merl being unable to parse the code with missing parens:

3> merl:quote({1,1}, "<< uri_encode_path_byte(Byte)  || <<Byte>> <= Value >>").
** exception throw: {error,"1:24: syntax error before: '('"}
     in function  merl:fail/2 (merl.erl, line 1080)
     in call from merl:quote_1/3 (merl.erl, line 491)

Did simply you compile the beams using erlc +debug_info x.erl?

What our highlighter does when it doesn't have the original source code is to pretty-print it using erl_prettypr and then parse it again using merl:quote. This comment above prettyprint_and_highlight is supposed to explain what it does:

%% Pretty-prints and highlights a node in an AST. `AstNode' must be a node
%% existing in the list `AstContext` (forms).
%%
%% To highlight a node in an AST without having the original source code:
%% Pretty-print the AST and parse again. Then, find the corresponding node
%% in the new AST. Find its location. Find its length by pretty-printing,
%% tokenizing and then checking the location and length of the last token.
%% Then print it all in a fancy way with the node highlighted.

Maybe this can be used to isolate the problem and report an OTP issue... Perhaps just parsing and prettyprinting (and parsing again, etc.) the binary comprehension and check if the parentheses are lost on the way somewhere.

I did a similar exercise with erl_prettypr on OTP 24:

$ cat z.erl
-module(z).

uri_encode_path_byte(_) -> <<>>.

f(Value) ->
    << (uri_encode_path_byte(Byte))  || <<Byte>> <= Value >>.
$ erl
...
6> c("z.erl", [debug_info]).
z.erl:3:1: Warning: function uri_encode_path_byte/1 is unused
%    3| uri_encode_path_byte(_) -> <<>>.
%     | ^

z.erl:5:1: Warning: function f/1 is unused
%    5| f(Value) ->
%     | ^

{ok,z}
7> {ok,{_,[{abstract_code,{_,AC}}]}} = beam_lib:chunks(code:which(z),[abstract_code]).
{ok,{z,[{abstract_code,
            {raw_abstract_v1,
                [{attribute,{1,1},file,{"z.erl",1}},
                 {attribute,{1,2},module,z},
                 {function,
                     {3,1},
                     uri_encode_path_byte,1,
                     [{clause,{3,1},[{var,{3,22},'_'}],[],[{bin,{3,28},[]}]}]},
                 {function,
                     {5,1},
                     f,1,
                     [{clause,
                          {5,1},
                          [{var,{5,3},'Value'}],
                          [],
                          [{bc,{6,5},{call,...},[...]}]}]},
                 {eof,{7,1}}]}}]}}
8> io:fwrite("~s~n", [erl_prettypr:format(erl_syntax:form_list(AC))]).
-file("z.erl", 1).

-module(z).

uri_encode_path_byte(_) -> <<>>.

f(Value) ->
    << (uri_encode_path_byte(Byte))
        || <<Byte>> <= Value >>.


ok
9>

Not sure what's going on - everything looks correct above.

IS there is a difference between merl:quote and erl_syntax:from_list...?