leegao / Yuck

A Compiler Tutorial for a contemporary dynamic language.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Yuck

Learning to implement an imperative language in the twenty-first century.

Imperative languages? What is this, 1996?

Why are you using Java for the Frontend? Are you a masochist?

Roadmap

  • Specification of the yuck language.
  • Turning text into trees.
  • Turning trees into "ycode" programs.
  • Interpreting "ycode" directly.
  • A few linters and static analyzers.
  • Simple peep-hole optimizations.
  • Gradually typing yucky code.
  • Smell-proofing yucky code: contracts and type refinements.
  • Instrumentation and profiling support, or how I stopped worrying about the smell and learned to love yuck.
  • JITing machine code directly from yuck-code: a crash course on contemporary tracing dynamic compilers.

Formal Specification

Yuck Grammar

Yuck is a simple imperative language. It is obnoxiously intuitive and natural for those who are familiar with the mainstream dynamic imperative languages of the twenty-first century.

At a first glance, Yuck has statements and expressions. Expressions are computations that outputs some value whereas statements do not.

Within Yuck expressions, you'll find your usual binary operations such as the arithmetic operators +, -, *, /, mod, **, logical operator and, or, comparisons <, >, ==, !=, etc, and a builtin range construct a to b. Additionally, you have other compound expressions like unary operators for - and not, function calls f(e, e), object instantiations new Foo(e), attribute selection foo.bar, list construction [a, b, c], table construction {k : v}, anonymous functions function(x, y) {...}, and table/list indexing a[e]. As primitives, you have boolean true, false, floats and ints, and strings like "Hello World". Finally, you also have variables like x, y, foo_bAr133.

Every Yuck expression can serve as a statement as well, regardless of whether they have any effect or not. In addition, you can have variable declarations (either var id; or var id = e;), function declarations function foo() {...}, while statements while e {...}, for loops for x in e {...}, if statements (if e {...} or if e {...} else {...}), empty statements ;, and class declarations of the form

class foo {
  var x;
  var y = bar;
  function meh() {
    ...
  }
}

While this is natural, as programming language developers, we should be a bit less wishy washy about all of this. Let's formalize this grammar in an extension of BNF. In particular, we will allow constructs like , , and (which denotes a production 0 or more times, a production 1 or more times, and a production 0 or 1 time).

LL(1) Grammar

Here, the grammar we've specified is mostly free of 1-lookahead conflicts, so it's amenable to a LL1 grammar with explicit conflict resolution. In particular, you will need to resolve conflicts for

  • at [, since it doesn't know whether you want [] or [...]. You can resolve this by looking at the next character and shifting to [] if it's a ], and [expr, (, expr)*] otherwise.
  • at {, which is the same problem as above for {} versus {...}.
  • for the token function. Here, we're not sure if we want to shift to an expression-statement function(){ ... }; or a function declaration function id() {...}. While it's perfectly fine to just ignore the first form (since it's effectively a NOP), we can resolve this easily by just looking at the next character, and shifting to the expression-statement production iff it's an open parenthesis (.
  • Within . For the token {, it's not entirely clearly whether we should shift to the expression-statement for a table or continue the else {...} clause. Here, we'll just always shift to the else clause.

Natural Grammar

For the sake of analysis, it's often easier to give a grammar specification that, while ambiguous, captures just the structure of our language. Here, we will give the specification of our language as an inductive class over the set of , expressions, and , statements.

The expressions are given by

where

denotes binary arithmetic operators,

denotes logical binary operators, and

denotes binary comparison operators.

Similarly, the statements are given by

While this grammar may not be easily implementable using your everyday flavor of parser generators, it does have the advantage that it is compact and it gives you an inductive construction. We can take the structure defined here and use it to construct an operational semantic for this language to reveal the types of information that we will have to carry around in order to fully execute this program.

Semantics

Simple Operational Semantics (Big Step)

We will give the operational semantics in terms of inferences rules. Here, the sequent

says that if

all hold, then we can deduce

. As we will see, it's very natural to specify the semantics of a language in terms of these inference rules.

Let denote the "execution" of a Yuck expression in contexts (for local variables) and (for the heap of objects). Since expressions may, in general, have side-effects, we also have to output the potentially altered contexts. Their semantics are given by

For statements, we also have a similar reduction which outputs the next set of contexts for the next instruction.

About

A Compiler Tutorial for a contemporary dynamic language.


Languages

Language:Java 86.3%Language:TeX 9.2%Language:Lex 3.0%Language:Python 1.4%Language:Shell 0.1%