In order to compile the project correctly, please make sure you have the following installed.
- Ocaml, minimum version 4.02.1
- Menhir, minimum version 20151005
To compile the whole project, execute make
.
To run the compiler, execute ./Main.native file [-fopoff]
. For example: ./Main.native main.txt -fopoff
. The compiler will generate an assembly file in the same directory.
file
is the path of the source code.-fopoff
turns off fron-end optimisation. If this tag does not appear, the compiler will perform front-end optimisation by default.
Another way to run the compiler is to execute ./Main.native [-fopoff]
without a specified file. This will allow you to type your code in terminal. An assembly file named "test.s" will be created in the same directory.
To run the tests, execute sh testbench.sh
.
The compiler will inform you where and what exactly happened, for instance, if we have this input program:
int a = ;
we will get
Parse error in Line 1, Column 9. expression was expected but I got this token: ;
This grammar is simulating a C-like imperative programming language. In the top-level, you can declare global variables and functions, below gives you a first taste of a valid program:
int a = 3;
int double(int x) {
return x + x;
}
main() {
a = 1;
int b = <<;
if (b > a) {
>> double(b);
} else {
>> double(a);
}
}
The syntax will be described more detailly in the sections below.
An identifier is the name of a variable or function. The name is restricted by the following rules:
- An identifier must begin with a small-case letter (a - z) or an underscore
_
. - It can be followed by (a - z) or (A - Z) or (0 - 9) or an underscore.
- An identifier cannot be a predefined keyword
- Example of valid identifier:
a
_a
_a123
- Example if invalid identifier:
A
A123
123A
There are five primitive data types.
int
corresponds to a 32-bit integerreal
corresponds to double precision floating point (double in C)char
a single characterstring
a string (cannot contain"
)bool
either true or false
You can either just to decalre a variable or at the same time, assign a value to it, for instance: int a;
and int a = 1;
are valid.
When declaring global variables, be aware that the right hand side of the statement can only be a simple expression (i.e. without function call). For instance: int a = 3 + 5;
is valid, but int a = f(2);
is invalid.
The right hand side can also be any expression including function call. Be aware that the variables declared inside a function is local to the function, for instance, if we decalre a global variable and a local varibale with the same name, the local one is used inside the function.
int a = 1;
main() {
int a = 1;
a = a + 1;
}
In the line a = a + 1;
, it will not change the global variable a
.
You can use <<
as basic input and >>
as output. <<
is an expression that will return a string by default, for example, you can assign the return value to a variable a = <<;
. >>
is followed by an expression in which the compiler will evaluate the expression before printing it. For instance: >> 3 + 5;
will print 8.
An operator is either a nullary/unary/binary operator. Here is the precedence of operators (from low to high)
>>
(non-associative)=
(right-associative)&&
||
!
==
!=
>
>=
<
<=
+
-
*
/
All operators are left-associative unless otherwise specified. For instance, an expression:
>> 3 + 4 * 5 >= 1;
will be evaluated to >> ((3 + (4 * 5)) >= 1)
.
Traditional control flow if
if else
while
do-while
and for
are supported.
Here are the rules for each statement:
if (expr) {statements}
if (expr) {statements} else {statements}
while (expr) {statements}
do {statements} while (expr)
for(int var = integer;expr;expr) {statements}
break
and continue
are also supported. break;
will escape from the closest loop and continue;
will repeat the closest loop. You can also declare labels in a control loop as follows:
while lbl: (expr) {statements}
do lbl: {statement} while (expr)
for lbl: (int var = integer;expr;expr) {statements}
With lables, you can also write break lbl;
or continue lbl;
to escape or repeat the labeled control loop, for instance in
main(){
int a = 0;
while lbl1: (a < 10;) {
int b = 0;
while (b < 10) {
if (b > 5) {
break lbl1;
}
}
}
}
break lbl1;
will escape the outer loop.
Funtion declaration is also allowed in the top-level. Here is an example:
int a = 5;
int double(int x) {
return 2 * x;
}
In the above, we defined a function called double
which takes a single int
parameter x
and return an int
.
Right now, the compiler will do constant folding. For instance, int a = 3 + 6;
will be transformed to int a = 9;
in the parse tree.
The compiler will also perform constant propagation and function inlining on the parse tree. For instance:
int a = 1;
int b = a + 1;
int double(int x) {
return x + x;
}
main() {
int c = double(a);
}
will be transformed to
int a = 1;
int b = 2;
int double(int x) {
return x + x;
}
main() {
int c = 2;
}
However, the optimisation within a block of code will be stopped, if we reach a statement that has side effects (ie, printing, prompting, assigning non-local variables), but we will continue on, if any, inner blocks. For instance:
int a = 1;
int c = a + 2;
int f(int x) {
a = a + 1;
return 1;
}
main() {
int d = 1;
int f = d;
int e = f(a);
int g = d;
}
will be transformed to
int a = 1;
int c = 3;
int f(int x) {
a = a + 1;
return 1;
}
main() {
int d = 1;
int f = 1;
int e = f(a);
int g = d;
}
For now, the compiler can generate codes for
- int arithmetic and comparison
- char comparison
- boolean operations
- prompting int
- printing int char and boolean
- control statement
- decalring and assigning global and local variables
- functions and function calls
Language | Time elasped |
---|---|
c | 0.03s |
java | 0.63s |
compiler | 0.00s |
Language | Time elasped |
---|---|
c | 5.35s |
java | 4.86s |
compiler | 13.24s |
In order to run the benchmark script, please execute sh benchmark.sh
. The code that was used in benchmarking can also be found inside the benchmark
folder.