cwtsteven / C-like-Compiler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

1 Dependencies

In order to compile the project correctly, please make sure you have the following installed.

  1. Ocaml, minimum version 4.02.1
  2. Menhir, minimum version 20151005

2 How to compile

To compile the whole project, execute make.

3 How to run

3.1 Run the compiler

To run the compiler, execute ./Main.native file [-fopoff]. For example: ./Main.native main.txt -fopoff. The compiler will generate an assembly file in the same directory.

  1. file is the path of the source code.
  2. -fopoff turns off fron-end optimisation. If this tag does not appear, the compiler will perform front-end optimisation by default.

Another way to run the compiler is to execute ./Main.native [-fopoff] without a specified file. This will allow you to type your code in terminal. An assembly file named "test.s" will be created in the same directory.

3.2 Run the tests

To run the tests, execute sh testbench.sh.

4 Error Reporting

The compiler will inform you where and what exactly happened, for instance, if we have this input program:

int a = ;

we will get

Parse error in Line 1, Column 9. expression was expected but I got this token: ;

5 Syntax

This grammar is simulating a C-like imperative programming language. In the top-level, you can declare global variables and functions, below gives you a first taste of a valid program:

int a = 3;

int double(int x) {
  return x + x;
}

main() {
  a = 1;
  int b = <<;
  if (b > a) {
    >> double(b);
  } else {
    >> double(a);
  }
}

The syntax will be described more detailly in the sections below.

5.1 Identifiers

An identifier is the name of a variable or function. The name is restricted by the following rules:

  • An identifier must begin with a small-case letter (a - z) or an underscore _.
  • It can be followed by (a - z) or (A - Z) or (0 - 9) or an underscore.
  • An identifier cannot be a predefined keyword
  • Example of valid identifier: a _a _a123
  • Example if invalid identifier: A A123 123A

5.2 Primitive Data types

There are five primitive data types.

  • int corresponds to a 32-bit integer
  • real corresponds to double precision floating point (double in C)
  • char a single character
  • string a string (cannot contain ")
  • bool either true or false

5.3 Declaring variables

You can either just to decalre a variable or at the same time, assign a value to it, for instance: int a; and int a = 1; are valid.

5.3.1 Global variables

When declaring global variables, be aware that the right hand side of the statement can only be a simple expression (i.e. without function call). For instance: int a = 3 + 5; is valid, but int a = f(2); is invalid.

5.3.2 Local variables

The right hand side can also be any expression including function call. Be aware that the variables declared inside a function is local to the function, for instance, if we decalre a global variable and a local varibale with the same name, the local one is used inside the function.

int a = 1;
main() {
  int a = 1;
  a = a + 1;
}

In the line a = a + 1;, it will not change the global variable a.

5.4 Basic I/O

You can use << as basic input and >> as output. << is an expression that will return a string by default, for example, you can assign the return value to a variable a = <<;. >> is followed by an expression in which the compiler will evaluate the expression before printing it. For instance: >> 3 + 5; will print 8.

5.5 Operators

An operator is either a nullary/unary/binary operator. Here is the precedence of operators (from low to high)

  • >> (non-associative)
  • = (right-associative)
  • && || !
  • == != > >= < <=
  • + -
  • * /

All operators are left-associative unless otherwise specified. For instance, an expression: >> 3 + 4 * 5 >= 1; will be evaluated to >> ((3 + (4 * 5)) >= 1).

5.6 Control

Traditional control flow if if else while do-while and for are supported. Here are the rules for each statement:

  • if (expr) {statements}
  • if (expr) {statements} else {statements}
  • while (expr) {statements}
  • do {statements} while (expr)
  • for(int var = integer;expr;expr) {statements}
5.6.1 Label, Break, Continue

break and continue are also supported. break; will escape from the closest loop and continue; will repeat the closest loop. You can also declare labels in a control loop as follows:

  • while lbl: (expr) {statements}
  • do lbl: {statement} while (expr)
  • for lbl: (int var = integer;expr;expr) {statements}

With lables, you can also write break lbl; or continue lbl; to escape or repeat the labeled control loop, for instance in

main(){
  int a = 0;
  while lbl1: (a < 10;) {
    int b = 0;
    while (b < 10) {
      if (b > 5) {
        break lbl1;
      }
    }
  }
}

break lbl1; will escape the outer loop.

5.7 Function

Funtion declaration is also allowed in the top-level. Here is an example:

int a = 5;
int double(int x) {
  return 2 * x;
}

In the above, we defined a function called double which takes a single int parameter x and return an int.

6 Syntactic Optimisation

6.1 Constant Folding

Right now, the compiler will do constant folding. For instance, int a = 3 + 6; will be transformed to int a = 9; in the parse tree.

6.2 Constant Propagation and Function Inlining

The compiler will also perform constant propagation and function inlining on the parse tree. For instance:

int a = 1;
int b = a + 1;

int double(int x) {
  return x + x;
}

main() {
  int c = double(a);
}

will be transformed to

int a = 1;
int b = 2;

int double(int x) {
  return x + x;
}

main() {
  int c = 2;
}

However, the optimisation within a block of code will be stopped, if we reach a statement that has side effects (ie, printing, prompting, assigning non-local variables), but we will continue on, if any, inner blocks. For instance:

int a = 1;
int c = a + 2;

int f(int x) {
  a = a + 1;
  return 1;
}

main() {
  int d = 1;
  int f = d;
  int e = f(a);
  int g = d;
}

will be transformed to

int a = 1;
int c = 3;

int f(int x) {
  a = a + 1;
  return 1;
}

main() {
  int d = 1;
  int f = 1;
  int e = f(a);
  int g = d;
}

7 Code Generation

For now, the compiler can generate codes for

  1. int arithmetic and comparison
  2. char comparison
  3. boolean operations
  4. prompting int
  5. printing int char and boolean
  6. control statement
  7. decalring and assigning global and local variables
  8. functions and function calls

8 Benchmark

Compile time
Language Time elasped
c 0.03s
java 0.63s
compiler 0.00s
Run time
Language Time elasped
c 5.35s
java 4.86s
compiler 13.24s

In order to run the benchmark script, please execute sh benchmark.sh. The code that was used in benchmarking can also be found inside the benchmark folder.

About


Languages

Language:OCaml 63.8%Language:Assembly 34.1%Language:Shell 0.9%Language:R 0.4%Language:C 0.3%Language:Java 0.3%Language:Makefile 0.2%Language:Rebol 0.0%