Concrete syntax for GHC Core

Question

Concrete syntax for GHC Core

ayberkt opened this issue 7 years ago · comments

I and @lucaspena have been working on defining some form of concrete representation for Haskell Core i.e., the Expr type from CoreSyn.hs. We need this so that we can define the language in K.

Previously I was working on the K definition for External Core which was intended to be an independent representation to ease communication between Core, which is a language defined for implementation purposes and was not initially intended to see daylight, and other tools. However, External Core is not maintained anymore and the -f-ext-core flag of GHC that used to generate External Core from Haskell is no longer available.

I was initially using an ABT notation as output to make binding in Core a bit more notationally uniform though we have now decided that this will be harder to work with in K. @bmmoore suggested that we could print the Core AST in the kast format of K with klabels, reducing the need to re-parse another concrete syntax, though this seems to introduce quite a bit of complexity and @lucaspena doubts that we could get this to work easily. So for now we have decided to keep our output format as similar as possible to this and then figure out what exactly we want to do later: we will either write a trivial parser in K or will set the details carefully so that our output can be used directly by K without any parsing.

Another thing that's been suggested is to use the GHC -ddump-ds flag whose output does not seem to be a well-defined language so we do not think we would be able to write a parser for it whose correctness we can convince ourselves of. We have also been suggested to tweak the Outputable instance of Expr to get a language we want; we feel that this is not significantly simpler than what we are doing and also quite similar, in a sense.

Defining our own format from scratch also has the added benefit that we know exactly what's going on and we can easily talk about it on paper, should we need to.

I am opening up this issue for future reference (as I had to explain what constitutes a problem in this situation for us a few times) and to get everyone on the same page. If you have any suggestions or concerns about what we began doing, this would be the time to voice them before we are heavily invested in it!

/cc: @ehildenb @grosu

Ömer Sinan Ağacan · Answer 1 · Sun May 21 2017 16:27:36 GMT+0800 (China Standard Time)

Hi,

Another thing that's been suggested is to use the GHC -ddump-ds flag whose output does not seem to be a well-defined language so we do not think we would be able to write a parser for it whose correctness we can convince ourselves of.

-ddump-ds outputs Core, the same language output by -ddump-simpl.

We have also been suggested to tweak the Outputable instance of Expr to get a language we want; we feel that this is not significantly simpler than what we are doing and also quite similar, in a sense.

You have to modify GHC for that, there's a simpler solution: just use GHC Core plugin API, which gives you access to the complete Core of a module and you can use GHC modules to access to details that Outputable won't give you (e.g. data type definitions). This way you also avoid using yet another concrete intermediate form just to move AST from one point to another (namely from GHC to your compiler) because you can directly generate AST for your compiler (possibly in a syntax that K can directly read).

I had a somewhat complex Core plugin here if you're interested. It's not documented but feel free to ask any questions you might have.

Ayberk Tosun · Answer 2 · Mon May 22 2017 00:10:38 GMT+0800 (China Standard Time)

@osa1 thank you so much for the suggestions!

So first, to clarify, we are not doing anything with the Outputable instance as we have decided that it will not significantly simplify things for us. What we are currently doing is using the GHC API to get the Core Expr type i.e., as follows:

compileToCore :: String -> IO [CoreBind]
compileToCore modName = runGhc (Just libdir) $ do
  _ <- setSessionDynFlags =<< getSessionDynFlags
  target <- guessTarget (modName ++ ".hs") Nothing
  setTargets [target]
  _ <- load LoadAllTargets
  ds <- desugarModule <=< typecheckModule <=< parseModule <=< getModSummary $ mkModuleName modName
  return $ mg_binds . coreModule $ ds

and then we go through the CoreBind and compile it to the format we want. We have also now concluded (after I opened the issue) that we want KORE as our precise output format (defined here), hence omitting the need for another concrete syntax as you have suggested. I might have given the impression that we are doing something with Outputable—that's not the case.

If I understand your suggestion correctly, using GHC Core plugins will allow us to use our existing code that transforms the Expr type to KORE as a GHC plugin with the -fplugin flag; is this correct? Does this have any significant benefit compared to having a standalone program that reads in Haskell as mentioned and outputs KORE? It seems to me that what we are currently doing is quite similar to what you suggested with the exception that it is not a plugin. I am not experienced with the internals of GHC, so please feel free to point out if I am wrong.

Ömer Sinan Ağacan · Answer 3 · Mon May 22 2017 13:40:59 GMT+0800 (China Standard Time)

If I understand your suggestion correctly, using GHC Core plugins will allow us to use our existing code that transforms the Expr type to KORE as a GHC plugin with the -fplugin flag; is this correct?

Yes,

Does this have any significant benefit compared to having a standalone program that reads in Haskell as mentioned and outputs KORE?

It's somewhat easier to use as you leave command line argument parsing and handling, compiling dependencies etc. to GHC, but it seems like you already figured how to drive GHC so you're probably better off using your code.

Ayberk Tosun · Answer 4 · Fri Jun 02 2017 23:59:44 GMT+0800 (China Standard Time)

Closing this as the compile-to-core tool is mostly complete now.