rendoir / feup-comp

ANTLR Compiler for YAL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

YAL Compiler G43

This project was developed in the Compilers course of MIEIC at FEUP. It's goal is to compile the yal language into java bytecode, to be executed by a JVM.

Dependencies

It is recommended to add the ANTLR4 jar file as an alias. Dunno how it works in Windows πŸ˜•

The parser code is in the .g4 file.

Usage

  • Generate the parser using 'antlr4 -Dlanguage=Python3 yal.g4'

  • Run the parser using:

    Usage:
      python3 main.py <file_name> [options]
     
    Arguments:
      file_name  - The absolute or relative path to the file to compile.
     
    Options:
      --quiet         (-q)     - Runs the compiler silently, without any print to the console. Used for the testing script
      --optimized     (-o)     - Optimizes the code generated
    

Compiler Status

Syntactic Analysis

Most of the work here is done by the ANTLR tool that we used. It's main limitations are skipping a whole rule in case of failure. This is fine in most cases, however should there be a semantic error on the header of a function, until that error is corrected, no other syntactic error will be reported for that given function.

  • βœ… ANTLR handles all syntactic errors
  • βœ… ANTLR handles reporting more than one syntactic error
  • βœ… Error message when no input given
  • βœ… Improve overall readability of error messages.

Semantic Analysis

  • βœ… Variables not defined in the current scope

  • βœ… Variables used before being initialized

  • βœ… Return variable of function not being initialized before function end

  • βœ… Undefined function in current module

  • βœ… Comparison between arrays is impossible.

    • In this case the compiler suggests to compare the size of both arrays.
  • βœ… Applying operators to arrays.

    • Suggest the use of array.size
  • βœ… Assigning size directly.

    • Suggests the use of = []
  • βœ… Different assignment types.

  • βœ… Variable NaN as array size.

  • βœ… Accessing non-array variables by index

  • βœ… Out of bounds array accesses. (When possible)

  • βœ… Accessing size property of non-array variable

  • βœ… Positive array size numbers.

    • Suggests the number must be positive
  • βœ… Function redeclaration

  • βœ… Using an array variable as the index of an array access.

    • Suggest using the size property of the array.
  • βœ… Applying operator between diferent variable types

  • βœ… Wrong function argument list.

    • Shows what was expected and what it got.
  • βœ… Variable declared in a single if branch, but used after if

    • Suggests declaring the variable in all possible code branches
  • βœ… Variable already defined

    • This is merely a warning that pop's up on the following case:

          module mod {
          a;
          b = 2;
          a; // Variable already defined, ignoring this line
          }
      

Code Generation

  • βœ… Code for function calls

  • βœ… Code for arithmetic expressions

  • βœ… Code for conditional instructions

  • βœ… Code for loops

  • βœ… Code to deal with arrays

    • βœ… Code to handle the following situation:

      a = [10];
      a = 5; //Put the number 5 in all position of the array
      
  • βœ… Branching variables declaration

Intermediate Representations

To aid in the development of our compiler, the compiler goes through 2 stages of intermediate representation.

High-Level Intermediate Representation

This representation has the highest abstraction level, and is still dependant on the yal language instructions. The representation contains an inner symbol table which is built while the parser is traversing the HIR tree, looking for semantic errors.

Low-Level Intermediate Representation

This representation has the lowest abstraction level and is mostly just composed of the simplest java bytecode instructions, such as load, store and operations. The leaves of the tree are composed of these 3 types of instructions, however the intermediate nodes still know whether the instruction is an assignment, an expression test and other types of yal instructions. This is made this way to allow for easier code optimizations.

Code Generation

Using the above mentioned LLIR, code generation becomes a lot easier, as all instructions from the HIR are now translated into simple load and store instructions. No third-party libraries were used.

Optimizations

Our code uses as a default the while templating, even without the -o flag.

The following optimizations are made when that flag is activated:

  • βœ… Constant propagation

    • Works for both local variables and module variables
  • βœ… Constant folding

  • βœ… While and If templating (saves 1 goto)

  • βœ… Lower cost instruction selection

    • Checks the size of the constants to use and uses the lowest available instruction from bipush, sipush and ldc.
    • Uses iload_#, istore_# when available
    • Uses iinc when possible, even for subtraction, in which case the compiler changes the constant signal
  • βœ… Algebraic Simplification

    • Sums/Subtractions by 0
    • Multiplications by 1 or 0
    • Divisions by self or 1
    • Bitwise-Shifts by 0

Test Suite

To continuously test our compiler and make sure no old bug resurfaced, we created a script test_script that would run our compiler against a series of files present in the folder files. This script would merely check if the compiler would not crash while compiling the above mentioned files. After a successful compilation, the contents of the generated files would be validated by hand.

To test the files do:

   $> ./test_script
   $> java -jar jasmin.jar <generated_file_name>
   $> java <class file name>

Overview

We decided to use python3 on our project since we were already much more familiarized with the language and it would also allow us to use ANTLR which at the time seemed like a better and easier to use alternative to JavaCC. We did not regret this, however it is worth mentioning the most of the Checkpoint 1 expected compiler behaviour, such as being LL(1) and doing the lexical analysis were all handled by the ANTLR tool.

Task Distribution

We aimed to keep the task distribution fairly uniform among the group members. At every iteration everyone would help as much as they were able to, however we believe these people stood out in the following areas:

  • Daniel Marques

    • Development of the to-be data flow analysis and graph coloring.
    • Refinement of the syntactic analysis results.
  • GonΓ§alo Moreno

    • I helped develop the code for the semantic analysis and the associated HIR.
    • Parsing the HIR to a LIR
  • JoΓ£o Carvalho

    • Building the HIR and integrating it with the semantic analysis.
    • Continuously testing the project for any errors my colleagues might have missed.
    • Passing the grammar from JavaCC to ANTLR, and starting and keep python's good coding practices.
  • JoΓ£o Almeida

    • Responsible for the generation of the code and the LIR
    • Development of the test suite.

The Group

  • NAME1: Daniel Filipe Santos Marques, NR1: 201503822, GRADE1: 18.5, CONTRIBUTION1: 25%
  • NAME2: GonΓ§alo Vasconcelos Cunha Miranda Moreno, NR2: 201503871, GRADE2: 18.5, CONTRIBUTION2: 25%
  • NAME3: JoΓ£o Filipe Lopes de Carvalho, NR3: 201504875, GRADE3: 18.5, CONTRIBUTION3: 25%
  • NAME4: JoΓ£o Francisco Barreiros de Almeida, NR4: 201505866, GRADE4: 18.5, CONTRIBUTION4: 25%

About

ANTLR Compiler for YAL


Languages

Language:Python 96.8%Language:Jasmin 2.7%Language:ANTLR 0.3%Language:Shell 0.2%