Due: February 12th, 2013 at 11:59:59 pm

Download: project2.zip

To Submit: Go Here

This is a group project. Teams of size 1 or 2 are acceptable. If you are working in a team of two, only one student needs to submit the project. Be sure to fill in the team.txt file.

Overview

Build a compiler for simple arithmetic expressions. Your compiler will accept source files of the form
  1 + 2 + X + 4 + 32
  
and will produce an object file (by default named a.out) that, when linked against runtime.c and then executed, takes a single command-line numeric argument for the value of the variable X and produces the resulting output:
  ./a.out 3
  42
  
The code your compiler generates should follow the cdecl calling conventions. After compilation, the source file from above should be functionally equivalent to this C program:
  int program(int X) {
    return (1 + 2 + X + 4 + 32);
  }
  

Building the Project

OCaml Libraries

On all platforms you need to add both unix and str to the list of OCaml libraries used for this project. You can set this by specifying them on the command line like: ocamlbuild -lib unix -lib str, or in Eclipse under the project's Properties pane:
  1. Open the Project » Properties dialog
  2. Select the Project pane (choose it from the left column)
  3. In the Libraries (-libs) box add unix,str

Windows

This project depends on platform-specific configuration data to assemble and link the X86 code your compiler generates. On Windows we assume that you are using Cygwin and the gcc4-g++ package for the gcc-4 compiler.
  1. Run Cygwin setup.exe and select one more package.
  2. After search 'gcc', click on '+' to extend the categories tree: the root node is "All", and the child node is "Devel". Then on the package ``gcc4-g++:G++ subpackage'', click 'skip' to change it to be 4.3.4-3. This will also install all dependent packages.
If these are installed properly, the compiler should be fine, but if you run into problems on Windows, ask the TA for help.

To turn off the following warning, under system variables, click on New, set variable name to CYGWIN and set value as nodosfilewarning and you're done.

cygwin warning:
MS-DOS sytel path detected: ...
Perfered POSIX equivalent is: ...
CYGWIN enviroment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths;
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames

Directories and Paths

Your compiler will produce X86 .s and .o files as well as native executables and .out files used during testing. By default these files will be generated in directories c_obj and c_bin respectively, and the default executable is called a.out. The main compiler accepts new command-line flags to change these directories from the default. Likewise, the -o option tells the compiler to use the given name for the result executable.

Note: Before running the compiler, you should create the c_bin and c_obj directories as sub-directories of your project. In OcaIDE you need to create the two directories under _build. If running the compiler issues an error like "can't open output file for writing", it probably means you are missing these directories.

Running the Project

This project needs slightly different build configurations compared to the earlier projects. The main executable also supports more command-line flags than our previous project's did. By default main looks for runtime.c (in the zip file) from the current directory.

Running from OcaIDE

In OcaIDE the main is in _build, so you need to specify the path of runtime.c by the -runtime flag before the --test in the ``Run Configurations...'' dialog.

Running on Linux

If you are using some flavor of linux (for example on Eniac) but not Mac OS X, you should specify the -linux flag when running the compiler. This turns off name mangling, which has different conventions on linux vs. Mac or Windows. In Eclipse you should add -linux before the --test flag.

Running from the command line

On all platforms, you should be able to run the main.native program from the command line. Do main --help from the command line for a synopsis of the allowed arguments.

Here is a sample interaction with a (fully implemented) version of the project:

~/Desktop/project2_temp> ocamlbuild -lib str -lib unix main.native
Finished, 47 targets (0 cached) in 00:00:00.

~/Desktop/project2_temp> ./main.native --help
CIS341 main test harness 

  --test run the test suite, ignoring other inputs
  -q quiet mode -- turn off verbose output
  -bin-path set the output path for generated executable files, default c_bin
  -obj-path set the output path for generated .s  and .o files, default c_obj
  -test-path set the path to the test directory
  -lib add a library to the linked files
  -runtime set the .c file to be used as the language runtime implementation
  -o set the output executable's name
  -S compile only, creating .s files from the source
  -linux use linux-style name mangling
  --stdin parse and interpret inputs from the command line, where X = 
  --clean clean the output directories, removing all files
  --show_ast print the abstract syntax tree after parsing
  -help  Display this list of options
  --help  Display this list of options

~/Desktop/project2_temp> echo "X * X * X" > foo.e

~/Desktop/project2_temp> ./main.native -o foo -runtime runtime.c foo.e
Processing: foo.e
root name: foo
* gcc -mstackrealign -c -m32 -o c_obj/foo.o c_obj/foo.s
* gcc -mstackrealign -m32 -o foo c_obj/foo.o runtime.c

~/Desktop/project2_temp> ./foo 3
27

The Expressions Language

The language accepted by your compiler is a simple collection of 32-bit arithmetic operations. See the file ast.mli for the OCaml representation of the abstract syntax -- the type exp of expressions is defined there.

There is only one "input variable", written X, available for use in your expressions -- the value of X is determined by the command-line argument passed in to the generated executable when your program is run. Otherwise, the meaning of the arithmetic operators is mostly standard (see below).

Implement an ocamllex lexer/ocamlyacc parser for integer arithmetic expressions. The terminal-symbol tokens processed by your lexer are described next.

Lexing

The file lexer.mll has some skeleton lexing code you can use as a starting point. It depends on an implementation of "file ranges" (defined in range.ml(i)) for generating error messages. You will need to add lexing rules to handle the cases below. This will require modifying parser.mly to declare the types of any new tokens you add -- the tokens should each take a Range.t parameter, which is used to print error messages. See the example of how the X token is declared in parser.mly.

Parsing

Your parser should accept strings according to the following ambiguous grammar, suitably disambiguated to implement the precedence of operations as defined in ast.ml (see the prec_of_* functions). Recall that higher precedence means tighter binding, that is, X + 3 * 4 should yield the same abstract syntax tree as X + (3 * 4). Explicitly parenthesized expressions should be treated as though they had maximal precedence.

The output of your parser should be an abstract syntax tree built using the constructors in ast.mli. To get you started, we have implemented the trivial language E ::= X.

The full (ambiguous) grammar for expressions has a single non-terminal (and starting nonterminal) E and rules given by:

E ::= X
   |    numeric literal
   |    (E)
   |    E + E
   |    E - E
   |    E * E
   |    E == E
   |    E << E
   |    E >> E
   |    E >>> E
   |    E != E
   |    E < E
   |    E <= E
   |    E > E
   |    E >= E
   |    ! E
   |    ~ E
   |    - E
   |    E & E
   |    E | E

Implement a parser for the disambiguated grammar for arithmetic expressions. All binary operations should be made to be left associative. That is, the input string 1 + 2 + 3 should produce the same parse tree as (1 + 2) + 3.

Hints

Compiling Expressions

You should complete the implementation of the compile_exp function in compiler.ml. This function takes as input a single Ast.exp expression and produces a single, globally visible code block whose entry point is given by the label created by calling Platform.decorate_cdecl "program". This code should follow the cdecl calling conventions, and behave as though implementing the C function defined as below (where <exp>) is the expression defined in our language:
  int program(int X) {
    return (<exp>);
  }
  

It is easiest to generate code when you maintain certain regularity invariants. In this case, you might want to use the invariant "the code resulting from compiling an expression places its answer in Eax", since Eax is the "return register" for cdecl calling conventions.

This strategy works fine for everything except binary operations -- for those you need to store a temporary value. Since we don't have very much by way of compiler infrastructure at this point, there aren't too many options for storing those temporary values. For this project, you should store the results on the stack (and return results in Eax). You should only have to push new values to the stack when compiling an expression containing a binary operator. In that case, first generate code for the left-hand branch, then push the result, then generate code for the right-hand branch. Remember to pop the intermediate value when you're done with it.

You can access the value at the top of the stack with the indirect operand (stack_offset 0l). To pop an item off the stack without being forced to store it somewhere, just manipulate Esp directly. Remember that the shift instructions put special requirements on the location of the shift amount; also note that you will have to generate some extra code for the comparison and logical negation operators, since they return exactly 0 or 1.

If you use registers other than the callee save Eax, Ecx, and Edx for the computation, you must save and restore those values (our solution code is able to manage without using any callee save registers). You might also want to consider storing X (the program argument) in a dedicated register for easy access during the computation.

Hints

Grading

Your project grade will be based on the test cases in gradedtests.ml. These are divided up into categories with point values as follows:
Last modified: Fri Feb 1 15:04:47 EST 2013