Discuss how to approach macro-heavy libraries

Question

Discuss how to approach macro-heavy libraries

wende opened this issue 6 years ago · comments

In Elixir a lot of libraries rely on macros and while these used inside functions can be easily mimicked and wrapped, when it comes to those as top scope definitions (right inside modules f.i) it starts to become quite unwieldy

One of the simplest examples is how to write a test. Let's say we've got a standard ExUnit.Case macro.

defmodule MyTest do
  use ExUnit.Case

  test "I want to test something" do 
    assert 1 == 1
  end
end

Right now there is no way to express that in Elchemy.
And while it is possible to do some tricks like

module MyTest do

{- ex 
  use ExUnit.Case
  test "my_test" do
     assert execute_test()
  end
-}

executeTest : Bool
executeTest = 
  1 == 1

end

While this is correct code that would work i'd say it's quite far from what we would call an elegant solution.
This issue's purpose is to discuss all the possibilities and come out with something most that solves most problems without causing too much over-complication

Krzysztof Wende · Answer 1 · Wed Feb 14 2018 21:26:35 GMT+0800 (China Standard Time)

I'm currently working on a proof of concept implementation of a Plugin approach to this.

Entire idea is that instead of adding macros to modify Elchemy AST (which would just end up being a type-safety hazardous mess) you get a toolset to create new modules manipulating Elixir AST.

So that from user-point of view calling

module SomeTestSuite exposing (suite)
import Elchemy.ExUnitPlugin

suite = 
  -- This will create a module using ExUnit.Case
  ExUnitPlugin.suite {someField = "mycontext"} -- <- this is a starting context passed further
  -- This will tell it to define a setup inside
  |> ExUnitPlugin.setup (\ {someField} -> {ctx = someField} )
  -- This will tell it to define a test inside
  |> ExUnitPlugin.test "1 + 1 equals 2" (\context -> 1 + 1 == 2 )

Which could then be used normally wherever desired

defmodule MyProjectTest do
  use ExUnit.Case

   SomeTestSuite.suite()
end

Which is already proven to work

Under the hood however it needs to create a module and INJECT an anonymous function into it's body.
So our example would actually resolve to (AST manipulation apart):

defmodule SomeTestSuite do

  # It's actually using `Module.create` but using defmodule for better readability here
  def suite() do 
    defmodule ExUnitPlugin.H(ContentHash) do
      use ExUnit Case 
      
      test "2 + 2 equals 1", context do
        assert unquote(f).()
      end
    end
  end

The entire problem however, is how to smuggle the anonymous function inside the created module.
We can't unquote it because anonymous functions can't be unquoted. I obviously want to avoid polluting the global state by setting some Processes/Agents/Whatever by all costs.

Reading on Elixir forum I found this (which I wasn't aware of before)

@OvermindDL1
An anonymous function is actually a public function in the module that the anonymous function is defined in, and a call to it 'essentially' just {ModuleName, #, [args]} (a bit different in reality, but this gets the point across).

But I wasn't able to find a way to call such a function.

@OvermindDL1 Can you elaborate on how that works and if it's possible to call an anonymous function just by having it's value passed (somehow) to the nested module?

Krzysztof Wende · Answer 2 · Wed Feb 14 2018 21:59:35 GMT+0800 (China Standard Time)

Ok. Found :erlang.fun_info looks like exactly what I need

:erlang.fun_info(Test.test)
[
  pid: #PID<0.98.0>,
  module: Test,
  new_index: 0,
  new_uniq: <<64, 160, 30, 242, 89, 33, 1, 230, 243, 33, 207, 54, 132, 229, 36,
    142>>,
  index: 0,
  uniq: 33882359,
  name: :"-test/0-fun-0-",
  arity: 1,
  env: [],
  type: :local
]

Krzysztof Wende · Answer 3 · Wed Feb 14 2018 22:43:54 GMT+0800 (China Standard Time)

Hmm. It might be no use after all. The functions are obviously local so I don't think it's possible to pass them further

OvermindDL1 · Answer 4 · Wed Feb 14 2018 23:30:28 GMT+0800 (China Standard Time)

Heh, you can pass them around just fine. An anonymous function is just, essentially internally, a tuple of the module name, the function index in the module function array, and a set of bindings to pass in to it as it's environment (which become part of its called arguments).

To 'pass it in', if you don't need an environment, is easy, you can just remotely call it, otherwise maybe unwrap the environment in to it.

However, I'm not quite understanding what you are trying to do, let me read in more detail...

Are you talking about the |> ExUnitPlugin.test "1 + 1 equals 2" (\context -> 1 + 1 == 2 ) bit needing to generate assert unquote(f).()? If so then you could do it if the f is the AST of the function wrapped in an anonymous function?

OvermindDL1 · Answer 5 · Wed Feb 14 2018 23:31:20 GMT+0800 (China Standard Time)

And don't forget, even macro-heavy libraries are still just only function calls, everything is function calls. :-)

If you approach it more like Lisp instead of metaprogramming like in other languages, then that is much closer to how Elixir really is. :-)

Krzysztof Wende · Answer 6 · Wed Feb 14 2018 23:34:09 GMT+0800 (China Standard Time)

@OvermindDL1 Obviously. But I want to give a [relatively] simple tool for people from outside to be able to write wrappers for common libraries like they would normally do it.
With macros:

        Test name body ->
            Application (atom "test")
                [ Value (String name)
                , Variable "context"
                , Do
                    -- A Quotation here is a place we will paste our function in
                    [ dotApplication Quotation [ Variable "context" ] 
                    ]
                ]
                |> unquote body

Instead of having to understand all of the internals of how ExUnit works and just do all of the macro work manually.

Krzysztof Wende · Answer 7 · Wed Feb 14 2018 23:40:11 GMT+0800 (China Standard Time)

What I would want to do is to be able to build AST "around" anonymous functions.

So for instance with a simpler example than ExUnit.
Let's say we want to define a module implementing a bevahiour.

So that would resolve to something like

defmodule MyCallingModule do
  def behaviour_implementation(f_to_call) do
    defmodule Whatever do
      @behaviour MyBehaviour
      
      def some_callback() do
        # Here somehow we need to call any function inside our parent, but since in Elm a global function and a local function are virtually the same I need to be able to pass an anonymous function here
        f_to_call.()
      end
    end
  end
end

Krzysztof Wende · Answer 8 · Wed Feb 14 2018 23:48:40 GMT+0800 (China Standard Time)

@OvermindDL1 What do you mean remotely call it?

Can you give an example of how to serialize (into AST) an anonymous function and to be able to execute it after rebuilding from serialized information about it?

OvermindDL1 · Answer 9 · Thu Feb 15 2018 00:14:58 GMT+0800 (China Standard Time)

Can you give an example of how to serialize (into AST) an anonymous function and to be able to execute it after rebuilding from serialized information about it?

What format is the anonymous function in, is it the AST of it, like fn x -> x end or so?

If you have no environment that needs to be included then just pack it as, say, {modulename, functionname, [args...]} and then apply that where needed?

Krzysztof Wende · Answer 10 · Thu Feb 15 2018 00:20:16 GMT+0800 (China Standard Time)

@OvermindDL1
The function is being passed from Elchemy so we can't be sure it is specified as a function body when called and not a variable. Although it can be

ExUnitPlugin.test (\_ -> 10 + 1 == 11)

it might as well be

let
  f = (\_ -> 10 + 1 == 11)
in
  ExUnitPlugin.test f

So quoting it on call inside Elchemy isn't really an option. Especially that it wouldn't behave consistent with other functions due to "lazyness" of macros.

The problem with using apply here is that even if there is an anonymous function
lets say

defmodule Test do
  def myfunction(), do: fn x -> x + 1
end

Even though I can get the name of the anonymous function by doing

Test.myfunction() |> :erlang.fun_info()
[
  pid: #PID<0.98.0>,
  module: Test,
  new_index: 0,
  new_uniq: <<64, 160, 30, 242, 89, 33, 1, 230, 243, 33, 207, 54, 132, 229, 36,
    142>>,
  index: 0,
  uniq: 33882359,
  name: :"-test/0-fun-0-",
  arity: 1,
  env: [],
  type: :local
]

That would still make this function local so I couldn't just call it with

apply(Test, :"-test/0-fun-0-", [1])

OvermindDL1 · Answer 11 · Thu Feb 15 2018 00:37:32 GMT+0800 (China Standard Time)

Actually if it might be an anon function or it might be a binding to an anon function, there is an easy way to handle that. :-)

# All of these do the same thing:

apply(fn -> 42 end, []) # returns 42

f = fn -> 42 end
apply(f, []) # returns 42

defmodule SomewhereElse do
  def bloop(), do: 42
end
apply(&SomewhereElse.bloop/0, []) # returns 42

Apply is an 'indirect' call, not super fast (but doesn't matter most of time).
A remote call is something like Blah.bloop(), and a local call is something like bloop(), and for note a binding call like f.() is an indirect call too (though with sometimes slightly more optimized VM assembly compared to apply, though not always).

OvermindDL1 · Answer 12 · Thu Feb 15 2018 00:38:36 GMT+0800 (China Standard Time)

And yes, the apply/3 is 'slightly' faster than apply/2 since there is no environment for it to worry about, but you can't always use that one unlike apply/2, which you can always use.

Krzysztof Wende · Answer 13 · Thu Feb 15 2018 00:51:19 GMT+0800 (China Standard Time)

@OvermindDL1 I don't think we're on the same page :-)

What I need is to be able to call an anonymous function inside of a dynamically declared module, but a function passed from the outside of this module, but also not as an argument but a compile time (which is fortunately run-time too, because it's dynamically declared module)

Example

defmodule OuterScope do
  def declare_module(my_fun) do
    defmodule InnerScope do
      def execute(), do: my_fun.()
    end
  end
end

Krzysztof Wende · Answer 14 · Thu Feb 15 2018 00:54:47 GMT+0800 (China Standard Time)

But the module inside is a quoted expression. And you can't pass anonymous functions to quoted expressions

OvermindDL1 · Answer 15 · Thu Feb 15 2018 01:16:50 GMT+0800 (China Standard Time)

Hmm, that code I don't think will do what you expect, you can't really declare a module inside a function, and doing so will work when the compiler exists but will fail in releases.

Krzysztof Wende · Answer 16 · Thu Feb 15 2018 01:25:42 GMT+0800 (China Standard Time)

@OvermindDL1
Oh. I haven't thought about that. Releases definitely would be a problem here...

I guess that kills my proof of concept here.

Have you got any other idea on how to use module level macros without polluting the scope of the modules?

For instance how one might go about writing an Elchemy wrapper for this:

defmodule Hello.User do
  use Ecto.Schema

  schema "users" do
    field :bio, :string
    field :email, :string
    field :name, :string
    field :number_of_pets, :integer

    timestamps()
  end
end

I thought that defining dynamically a module implementing those might be the best idea, but now I see it's probably not.

Krzysztof Wende · Answer 17 · Thu Feb 15 2018 01:44:06 GMT+0800 (China Standard Time)

I'm starting to slowly reconsider returning the meta function definition in Elchemy modules to be able to add a code to the module

OvermindDL1 · Answer 18 · Thu Feb 15 2018 01:46:45 GMT+0800 (China Standard Time)

For instance how one might go about writing an Elchemy wrapper for this:

I'd probably just keep them as function calls.

I don't think that elm has top-level calling like better languages like OCaml and such, so let's give the 'top level' calling scope a special name, how about __top__, then maybe this?

__top__ =
  let schema = remote("Ecto.Schema.schema", 2) in
  let field = remote("Ecto.Schema.field", 2) in
  schema "users" {do: [
    field Bio String,
    field Email String,
    field Name String,
    field Number_of_pets Integer,

    remote("Ecto.Schema.timestamps", 0)
  ]}

Or whatever you want, the important bit is not necessarily treating it as calls directly, but recognizing that the AST is what it is important to Macro's, and a do body is really a __block__ of a list of AST elements as one example. :-)

Krzysztof Wende · Answer 19 · Thu Feb 15 2018 01:52:39 GMT+0800 (China Standard Time)

@OvermindDL1
Yeah. So basically what meta function used to do, but I've gotten many negative opinions about this approach from elm-core team.
Thank you for suggestions. Will have to take a moment to think about it then 👍

OvermindDL1 · Answer 20 · Thu Feb 15 2018 02:04:13 GMT+0800 (China Standard Time)

The elm-core team are also very haskell'y, they like their purity even if it means being less productive, look at Ports after all. ;-)

OvermindDL1 · Answer 21 · Thu Feb 15 2018 02:04:45 GMT+0800 (China Standard Time)

/me still thinks wende should instead compile OCaml down to Elixir instead... almost identical syntax, but has 'more' and far far better designed

Krzysztof Wende · Answer 22 · Thu Feb 15 2018 02:06:39 GMT+0800 (China Standard Time)

@OvermindDL1 First of all - sunk cost fallacy 😉
Second of all, OCaml has one problem. It's hard to read. People like Elm because it's stupid simple (until it isn't, but well)
Plus I despise lowercase types 😄

OvermindDL1 · Answer 23 · Thu Feb 15 2018 02:09:19 GMT+0800 (China Standard Time)

OCaml is hard to read? I'd love for you to show examples in Elm that I could translate into OCaml that is hard to read. ;-)

Ah but lowercase types matches Elixir/Erlang too! ;-)

Krzysztof Wende · Answer 24 · Thu Feb 15 2018 02:12:31 GMT+0800 (China Standard Time)

@OvermindDL1 Honestly it might be just me not being used to the syntax.
Nevertheless together with community we can fix Elm haha ;-)

Plus I don't want to get into competition with guys from Alpaca Lang. They're doing a great job

OvermindDL1 · Answer 25 · Thu Feb 15 2018 02:17:54 GMT+0800 (China Standard Time)

@OvermindDL1 Honestly it might be just me not being used to the syntax.

Remember, Elm's syntax was based on OCaml's, in fact it is so close to a direct translation that changing a couple ,'s into ; and changing the haskell'y ambiguous multi-let's into unambiguous single let's gets it to be valid OCaml. ^.^

Nevertheless together with community we can fix Elm haha ;-)

Heh, good luck there, I've seen how extremely hostile they are to suggestions (without unemotional reasoning as to their stances), it very much reminds me of the hostile and antagonistic Haskell community...

Plus I don't want to get into competition with guys from Alpaca Lang. They're doing a great job

A couple things though:

I think they are going about it the wrong way by not black boxing messages. By typing processes they make it much harder to upgrade and reason about things.
They are lacking a massive amount of capabilities that OCaml just has out of the box (first class modules, which the BEAM VM supports just fine, GADT's, Polymorphic Variants, extensible variants, etc... etc...).
Lacking the macro capabilities that OCaml has (not 'quite' as easy to 'make' as Elixir macro's, but just as powerful and easy to use).
And it lacks the immense tooling that the OCaml ecosystem has (seriously, it puts Elm to absolute shame).

As well as OCaml has the fastest optimizing compiler of any compiled language in existence. :-)

OvermindDL1 · Answer 26 · Thu Feb 15 2018 02:19:18 GMT+0800 (China Standard Time)

As well as OCaml is designed to be extended. It even has 2 backends to compile to javascript (one is more readable, one is lower level but more powerful). So you could use the same language for the BEAM, javascript, and native compilation. ;-)

Krzysztof Wende · Answer 27 · Thu Feb 15 2018 02:22:36 GMT+0800 (China Standard Time)

I've seen how extremely hostile they are to suggestions

Oh yes. Trust me, I'm aware

And it lacks the immense tooling that the OCaml ecosystem has

My impression was always the exact opposite. Before Reason came out I felt incredibly intimidated by OCaml's errors and general tooling. Definitely not less than Haskells though, that's true.

It's been a long time since I last tried it. Might be I should give it another try sometime

OvermindDL1 · Answer 28 · Thu Feb 15 2018 02:34:25 GMT+0800 (China Standard Time)

My impression was always the exact opposite. Before Reason came out I felt incredibly intimidated by OCaml's errors and general tooling. Definitely not less than Haskells though, that's true.

Reason has to date not created anything 'new', but has packaged some well known things together, like BetterErrors (a common OCaml package to make errors more readable), merlin (OCaml's language server), etc... These are usually all added in most ocaml packages straight out anyway. ;-)

It doesn't include the indenter of ocp-indent because they just convert back and forth between reason and ocaml anyway (oh, and their reformatter is in the OCaml packaging system opam too). I'm not terribly a fan of full formatters, I tend to prefer indenters as then it doesn't lose context-sensitive formatting cues that formatters lose.

OCaml has a couple build systems, ocamlbuild is the usual 'default' old one, but most people use JBuilder as it's just better, and in fact so many people use JBuilder that the next OCaml version is having it be the default build system under a new name of Dune.

It's been a long time since I last tried it. Might be I should give it another try sometime

If you aren't on windows it's trivial to setup and install (I just sudo apt install opam then use opam, just like Rusts Cargo, to choose whatever versions and styles of the compiler I want, packages, etc...). On Windows you have to actually manually download opam and such but there are pre-setup full packages already that include cygwin as well for handling native stuff (though they are fixing up opam to run native on windows currently, might be in next version actually). :-)

And all of these tools it has have existed for years. ^.^

Krzysztof Wende · Answer 29 · Tue Feb 20 2018 09:16:48 GMT+0800 (China Standard Time)

So after experimenting with it a little there's a summary of pitfalls using a top/meta function:

Meta definition resolving to code injected into the top scope

Pitfalls:

Problem: No such thing as do blocks
- Solution: Use a Do (List x)
  - Problem: Lists in Elm can't contain two different types in them
- Solution: Don't use blocks at all (wrap them anywhere outside)
- Solution: Add a special opaque type Macro that can be put only into the meta tag.
Problem: Can't use other functions in the module
This is a big one. I'm afraid it's gonna be to hard to explain why you can't just do:

meta =
  something myfun -- error

myfun = 1

And you can't because

something(myfun()) # No such thing as myfun yet

defp myfun(), do: 1

And that feels really bad, unless explicitly explained but that's an additional layer of complexity I'd really prefer to avoid.

A good example on how to present it would be a PoC showing a test suite using ExUnit, with tests (most elegantly generated from an existing elm suite.

Which would mean a simple and clean non-compiler-solution to turning this

module UnitTests exposing (..)

import Test exposing (..)
import Expect

all : Test
all =
    describe "Test"
        [ test "Test truth" <|
            \() ->
                Expect.equal "a" "a"
        ]

and this

module MyAppTest do
  use ExUnit.Case

  UnitTests.all()
end

Into something that behaves like this:

module MyAppTest do
  use ExUnit.Case

  describe "Test"
    test "Test truth" do
      assert "a" == "a"
    end
  end

end

Because the first solution that comes to my mind would be:

module UnitTests exposing (..)

import Test exposing (..)
import Expect

meta = 
  all |> turnToMacro

all : Test
all =
    describe "Test"
        [ test "Test truth" <|
            \() ->
                Expect.equal "a" "a"
        ]

But this won't work because all is not known by at the point when meta is invoked

Krzysztof Wende · Answer 30 · Sun Apr 29 2018 00:58:08 GMT+0800 (China Standard Time)

Much closer to the solution, however there is one caveat of readability:
When calling a macro in Elixir, you can't really know just by looking at the code if it's a macro or a function. Which has a problem of code as value vs value of the code
Looking at this example:

b = a + 1
function(a + 1) == function(b)

We can be sure the output will be always true, however

b = a + 1
macro(a + 1) == macro(b)

Here we cannot. Because a + 1 and b are not the same code-wise (they produce different AST)
Which causes a state of a slight impurity.

That leads to a problem that:

test "MyTest" \() -> assert True

would work, while

myTest () = assert True
test myTest

wouldn't.

The only solution I can think of is introducing a new keyword/type/function to Elchemy that would explicitly denote passing the code as an AST rather than it's value (basically what quote does in Elixir)

It could be for example Do
So that when we see:

Do <| 10 + 20

It's not yet evaluated, but passed as as an AST

This way we could know straigh away that

function_or_macro(a + 1) -- This is a function
function_or_macro(Do <| a + 1) -- this is a macro

The problem is it introduces a certain layer of complexity that I'd like to avoid.
All suggestions welcome. Otherwise I'll release the Do solution onto the dev branch and see it work for a while

OvermindDL1 · Answer 31 · Tue May 01 2018 22:39:39 GMT+0800 (China Standard Time)

Instead of a Do like thing, what about using a special prefix operator to 'call'/inject a macro result? :-)