Arsenic-ATG / LameCC

A lame c compiler which implements a basic lexer, an LR(1) parser and a recursive descent parser.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LameCC

A lame c compiler which implements a basic lexer, an LR(1) parser and a recursive descent parser

Download this project

git clone --recurse-submodules https://github.com/leo4048111/LameCC

Basic features

  • Lexer ☑
  • LR(1) Parser ☑
  • Recursive Descent Parser ☑
  • Semantic Analysis ☑
  • Intermediate Code Generator(in both Quaternion and LLVM IR forms) ☑
  • Code Optimization ☐ // TODO
  • Assembly Generator ☐ // TODO

Miscellaneous features

  • Prettified json dump
  • Log info/error
  • Visualized LR(1) canonical collection, ACTION GOTO table and LR(1) parsing process
  • Other features

Build prerequisites

  • OS: Windows or GNU/Linux
  • Cmake version >= 3.8
  • Installed LLVM libraries and cpp headers, make sure you have set CMAKE_PREFIX_PATH or LLVM_DIR env variable to LLVM directory properly
  • If you are running Windows and have installed MinGW64, simply run build.bat

Parsing capability

  • int/float var declaration/definition
  • if-else statement
  • int/float/void function declaration/definition
  • while statement
  • value statement(complex expression, function call, etc...)
  • return statement

Usage

Example input source file(see ./testcases/test.cpp):

// nonvoid return type function decl with params
int NonVoidFuncDeclWithParams(int parm1, int parm2);

// nonvoid return type function decl without params
char NonVoidFuncDeclWithoutParams();

// nonvoid return type function definition with params
float NonVoidFuncDefWithoutParamsWithEmptyBody()
{
    return 0xAF.D65P-5; // some float representations
}

// nonvoid return type function definition with params
int NonVoidFuncDefWithParamsWithEmptyBody(int param1, char param2)
{
    return 0;
}

// void return type function decl with params
void VoidFuncDeclWithParams(int parm1, int parm2);

// nonvoid return type function decl without params
void VoidFuncDeclWithoutParams();

// void return type function definition with params with empty body
void VoidFuncDefWithoutParamsWithEmptyBody()
{
}

// void return type function definition with params with empty body
void VoidFuncDefWithParamsWithEmptyBody(int param1, int param2)
{
}

// function definition
int main()
{
    int left = 0;                                                                         // DeclStmt
    int right = 100;                                                                      // DeclStmt
    int target = (NonVoidFuncDefWithParamsWithEmptyBody(99, 100) % 2 + 5) - right * left; // complex Expression

    while (left < right) // WhileStmt
    {
        int mid = (left + right) / 2;
        if (mid == target)     // IfStmt
            return mid;        // ReturnStmt
        else if (mid < target) // elseBody which is another IfStmt
            left = mid + 1;    // ValueStmt
        else                   // elseBody
            right = mid;       // ValueStmt
    }

    return left; // ReturnStmt
}

Command options:

PS D:\Projects\CPP\Homework\LameCC\build> .\LameCC.exe -?
Usage:
  LameCC.exe <input file> [options]
Available options:
  -?, --help         show all available options
  -o, --out          set output file path
  -T, --dump-tokens  dump tokens in json format
  -A, --dump-ast     dump AST Nodes in json format
      --LR1          specify grammar with a json file and use LR(1) parser
      --log          print LR(1) parsing process

Run command:

PS D:\Projects\CPP\Homework\LameCC\build> .\LameCC.exe ../testcases/test.cpp -A -T --LR1 ../src/grammar.gram --log

Token dump:

[
  {
    "id": 1,
    "type": "TOKEN_KWINT",
    "content": "int",
    "position": [
      2,
      1
    ]
  },
  {
    "id": 2,
    "type": "TOKEN_IDENTIFIER",
    "content": "NonVoidFuncDeclWithParams",
    "position": [
      2,
      5
    ]
  },
  {
    "id": 3,
    "type": "TOKEN_LPAREN",
    "content": "(",
    "position": [
      2,
      30
    ]
  },
  {
    "id": 4,
    "type": "TOKEN_KWINT",
    "content": "int",
    "position": [
      2,
      31
    ]
  },
  ...

AST dump:

{
  "type": "TranslationUnitDecl",
  "children": [
    {
      "type": "FunctionDecl",
      "functionType": "int(int, int)",
      "name": "NonVoidFuncDeclWithParams",
      "params": [
        {
          "type": "ParmVarDecl",
          "name": "parm1"
        },
        {
          "type": "ParmVarDecl",
          "name": "parm2"
        }
      ],
      "body": "empty"
    },
    {
      "type": "FunctionDecl",
      "functionType": "char()",
      "name": "NonVoidFuncDeclWithoutParams",
      "params": [],
      "body": "empty"
    },
    {
      "type": "FunctionDecl",
      "functionType": "float()",
      "name": "NonVoidFuncDefWithoutParamsWithEmptyBody",
      "params": [],
      "body": [
        {
          "type": "CompoundStmt",
          "children": [
            {
              "type": "ReturnStmt",
              "value": [
                {
                  "type": "FloatingLiteral",
                  "value": "5.494911"
                }
              ]
            }
          ]
        }
      ]
    },
    ...

LR(1) Canonical Collections:
image
ACTION GOTO Table:
image
Parsing Process:
image

Credit

About

A lame c compiler which implements a basic lexer, an LR(1) parser and a recursive descent parser.

License:Do What The F*ck You Want To Public License


Languages

Language:C++ 99.5%Language:CMake 0.4%Language:Batchfile 0.1%