README for Decaf Compiler PACKAGE INFO ================== DCC uses the GNU Build System. The GNU Build System automates many tasks for ensuring source portability and detecting missing dependencies. Autoconf generates the configure script that checks different features of the system such as libraries, standard library functions, types, and headers, and the presence of compilers. Automake implements a syntax that is more clear and powerful than that of plain Makefiles and is used with Autoconf and the configure script to adapt the generated Makefiles to the system's configuration. Libtool is used to create libraries and ensures that the type of library built (static or shared, whether to use position-independent code etc.) is compatible with the system. Furthermore, Automake has a useful testsuite feature. Finally, the ./configure, make, make install sequence of invocations is used by many package management systems. Since I am developing and building DCC on two very different platforms (64-bit Mac OS X and Linux), the features of the GNU Build System used for ensuring source portability is especially useful. EXECUTABLES =================== There are three executables built: dcc - the Decaf Compiler dcc_i386 - Decaf Compiler using x86 code generation Executables may be found in the src directory SIMPLE BUILDING =================== To automatically configure and build, run the "configure" script then make, e.g.: ./configure make To install the executables (by default in /usr/local/bin, /usr/local/lib, etc), then run: make install Note that if you want to change the directory where dcc is installed to, run configure setting the prefix option: ./configure --prefix=/path/to/installdir If you have issue running gdb on what appears to be the executables before running make install, it may be because Libtool built a wrapper script around the actual executables (which would be located in .deps). To use the actual executables, run make install. As of pp3, this should no longer be applicable since I am no longer using Libtool. If you have more than one compiler on your system, you may explicitly set the CC and CXX variables in the environment before running configure and make. For example (assuming sh/bash/ksh): CC=gcc ./configure make Similarly, using csh: setenv CC gcc ./configure make To clean: make clean For additional configuration options, see: ./configure --help TESTING ======== To run all the tests: make check ADDITIONAL SCRIPTS ======================= There are two scripts used to initialize the Autotools scripts and clean the files generated by the Autotools. These are especially useful since keeping the generated files around makes the source tree more difficult to manage especially since I am maintaining a local git repository. To initialize the GNU Build System for DCC, run: bootstrap To clean all Autotools-generated files, run: clean SEMANTIC ANALYZER ==================== I chose to use a spaghetti stack to implement the symbol table for the semantic analyzer. The spaghetti stack is first populated with declaration information during the first pass. During the second pass, the pointers on the SymTable nodes used for class hierarchy are used to set up the class hierarchy. During this pass, a "vtable" of sorts for each ClassDecl is also populated with all the functions from the interfaces the class implements. During these first two passes, most of the checking for declaration conflicts are caught. During the third pass, the rest of semantic analysis is performed. During the recursive unwind on AST, "return types" are set by child nodes and then used by parent nodes for type checking. Error recovery is performed using the error type, which is set if errors are found while checking the current node. However, propagation of errors is suppressed by making the error type compatible with all other types. This is because I am assuming that as long as the code fragment containing the source of the error is fixed, the higher up nodes should be error-free as well. If not, the new error will be caught when the compiler is run again. EXTENSIONS ==================== Progress ------------------ Bit operators [DONE ] Octal ints [IN PROGRESS] Closures [IN PROGRESS] x86 codegen [IN PROGRESS] Linker/loader [PLANNED ] Character literals [PLANNED ] Class protection [PLANNED ] OTHER SOURCE MODIFICATIONS ============================= I modified ParseCommandLine() to use getopt to facilitate testing of different components of the compiler through the -t flag. For example, running dcc with the -tparser option will only run the the compiler up to the parser and dump the parse tree a la pp2. This is used to facilitate running the test suites incrementally while I go back and improve code from earlier assignments and work on the extensions. IR GENERATION ================ I augmented the semantic checking constructs, in particular the symbol tables, to aid with IR generation in pp4. During the recursive Emit calls, Location objects are added to the Symbols for variables for lookup. I added a simple class FrameAllocator that is used to keep track of offsets of variables with respect to the fp on the stack or the gp globally or with respect to the base of a class. IR Generation for classes is done in two passes. First, like in semantic checking, the class hierarchy and inheritance tree is set up using the EmitSetup method. During this pass, the offsets for all methods and inherited methods and class variables are computed. During the second pass, each method is emitted. To take into account parent classes that are declared after a child class, the child would call the parent's EmitSetup method recursively upward if necessary. A simple check is in place to ensure that the main body of EmitSetup is only called once. Class labels are prefixed using C_, and functions are prefixed using F_. This way, class names and function names will not conflict with builtins and instructions. One area of improvement is to ensure that variable names will not conflict with anything.