DosWorld / o0c

Oberon-0 Compiler in C for MS-DOS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Oberon0-C

Oberon-0 Compiler in Watcom C

This project is a translation of Niklaus Wirth's Oberon-0 compiler as presented in [1] into Watcom C. Oberon-0 is a simplified variant of Oberon with just the basic types INTEGER and BOOLEAN and without function procedures. It only compiles a single module. However, Oberon-0 has procedures, value and reference parameters, arrays, and records.

The source code is compiled for a hypothetical RISC processor as presented in [1]. The RISC processor is reduced to a bare minimum and mainly offers integer arithmetic, comparison, and branch instructions. Some details are given below.

[1] Oberon0 Compiler, Niklaus Wirth, Compiler Construction, 2005

Building

cd src
make

Compiling and Running an Example

cd examples
..\o0c QSORT.MOD

Structure

The port of the Oberon-0 compiler is structured as follows: scan.c contains the scanner. It performs lexical analysis, i.e., it transforms Oberon-0 source text into a stream of symbols (tokens). Syntax analysis is done by parser.c. The structure of the recursive descent parser directly mirrors the Oberon-0 EBNF grammar, given below. The parser calls functions in gen.c to emit code for the RISC processor. The parser also checks types, ranges of constants, etc. The compiler is a non-opt imizing single-pass compiler. A simulator of the RISC processor is located in risc.c.

The main function is located in o0c.c. If a source file could be compiled successfully, o0c prints the generated instructions and directly executes the code using the simulator. The structure and the source code of the compiler are a direct port of the Oberon-0 compiler given in [1].

Oberon-0 EBNF Grammar

ident = letter {letter | digit}.
integer = digit {digit}.

selector = {"." ident | "[" expression "]"}.

factor = ident selector | integer | "(" expression ")" | "~" factor.
term = factor {("*" | "DIV" | "MOD" | "&") factor}.
SimpleExpression = ["+"|"-"] term {("+"|"-" | "OR") term}.
expression = SimpleExpression [("=" | "#" | "<" | "<=" | ">" | ">=") SimpleExpression].

assignment = ident selector ":=" expression.

ActualParameters = "(" [expression {"," expression}] ")" .
ProcedureCall = ident [ActualParameters].

IfStatement = "IF" expression "THEN" StatementSequence
              {"ELSIF" expression "THEN" StatementSequence}
              ["ELSE" StatementSequence] "END".

WhileStatement = "WHILE" expression "DO" StatementSequence "END".

statement = [assignment | ProcedureCall | IfStatement | WhileStatement].
StatementSequence = statement {";" statement}.

IdentList = ident {"," ident}.
ArrayType = "ARRAY" expression "OF" type.
FieldList = [IdentList ":" type].
RecordType = "RECORD" FieldList {";" FieldList} "END".
type = ident | ArrayType | RecordType.

FPSection = ["VAR"] IdentList ":" type.
FormalParameters = "(" [FPSection {";" FPSection}] ")".

ProcedureHeading = "PROCEDURE" ident [FormalParameters].
ProcedureBody = declarations ["BEGIN" StatementSequence] "END".
ProcedureDeclaration = ProcedureHeading ";" ProcedureBody ident.
declarations = ["CONST" {ident "=" expression ";"}]
               ["TYPE" {ident "=" type ";"}]
               ["VAR" {IdentList ":" type ";"}]
               {ProcedureDeclaration ";"}.

module = "MODULE" ident ";" declarations ["BEGIN" StatementSequence] "END" ident "." .

RISC Processor

The hypothetical RISC target is a 32-bit processor, i.e., the word size is 4 bytes. All instructions are one word long. There are 16 general-purpose registers (R0 to R15). R15 is used as the program counter (PC) and R14 (link register) is used by the branch-to-subroutine (BSR) instruction to store the return address. The comparison instructions (CMP, CMPI) implicitly compute the difference of their operands and set the N (negative) and Z (zero) flags according to the result. There are four instruction formats:

  • F0: 00 op[4] a[4] b[4] unused[14] c[4]
  • F1: 01 op[4] a[4] b[4] im[18]
  • F2: 10 op[4] a[4] b[4] disp[18]
  • F3: 11 op[4] disp[26]

where a, b, and c refer to registers (R0-R15) and im and disp are 18-bit immediate/displacement values. They are given in two's complement format. For most instructions, a is the destination register and b and c are the operand registers. The opcodes occupy the upper 6 bits of the instruction word. The opcodes are:

  • F0: 00 op[4] a[4] b[4] unused[14] c[4]
    • MOV = 0, MVN = 1 (move, move negated)
    • ADD = 2, SUB = 3, MUL = 4, Div = 5, Mod = 6
    • CMP = 7 (compare registers)
  • F1: 01 op[4] a[4] b[4] im[18]
    • MOVI = 16, MVNI = 17 (move immediate, move immediate negated)
    • ADDI = 18, SUBI = 19, MULI = 20, DIVI = 21, MODI = 22
    • CMPI = 23 (compare register with immediate operand)
    • CHKI = 24 (check index)
  • F2: 10 op[4] a[4] b[4] disp[18]
    • LDW = 32, LDB = 33, POP = 34, STW = 36, STB = 37, PSH = 38 (load from, store to memory)
    • RD = 40, WRD = 41, WRH = 42, WRL = 43, RB = 44, WRB = 45 (input/output)
  • F3: 11 op[4] disp[26]
    • BEQ = 48, BNE = 49, BLT = 50, BGE = 51, BLE = 52, BGT = 53 (conditional branch)
    • BR = 56 (unconditional branch)
    • BSR = 57, RET = 58 (branch to, return from subroutine)

The input/output instructions are included to allow the simulated RISC to read from standard input and write to standard output. RB/WRB reads and writes a single byte, RD/WRD reads and writes an integer number in decimal form, WRH writes an integer number in hexadecimal form, and WRL writes a line break.

Example Compilation

This example counts the number of bytes in the standard input stream.

MODULE CountBytes;
    VAR c, n: INTEGER;
BEGIN
    n := 0;
    ReadByte(c);
    WHILE c # -1 DO
        INC(n);
        ReadByte(c)
    END;
    Write(n); WriteLn
END CountBytes.

Compile and run the program to compute the size of its source text.

..\src\o0c CountBytes.Mod < CountBytes.Mod

The output (with comments) is:

entry = 0
20 instructions generated
  0 MOVI 13  0 4096    set stack pointer to top of memory
  1 PSH  14 13  4      push link register (initially contains 0)
  2 MOVI  0  0  0      n := 0;
  3 STW   0 15 -20
  4 RB    0  0  0      ReadByte(c);
  5 STW   0 15 -24
  6 LDW   0 15 -28     c
  7 CMPI  0  0 -1      c # -1
  8 BEQ   8            WHILE c # -1 DO
  9 LDW   0 15 -44     n
 10 ADDI  0  0  1      INC(n);
 11 STW   0 15 -52
 12 RB    0  0  0      ReadByte(c)
 13 STW   0 15 -56
 14 BR   -8            END;
 15 LDW   0 15 -68     n
 16 WRD   0  0  0      Write(n);
 17 WRL   0  0  0      WriteLn
 18 POP  14 13  4      pop link register (return address) from stack
 19 RET  14            return (using return address in link register)
RISC OUTPUT BEGIN
 181                   size of CountBytes.Mod is 181 bytes
RISC OUTPUT END

About

Oberon-0 Compiler in C for MS-DOS

License:MIT License


Languages

Language:C 73.8%Language:Modula-2 26.2%