CIS 341 HW4: Oat v.1 Compiler

Due: 26 March 2015 at 11:59:59pm

Download: hw4.zip
Oat v.1 Language Specification

To Submit: Go Here

This is a group project. Teams of size 1 or 2 are acceptable. If you are working in a team of two, only one student needs to submit the project.

Warning: This project will be time consuming. Start early and contact the course staff with any questions. If you find yourself stuck or writing tons and tons of code, please ask for help on Piazza or in office hours.

Getting Started

To get started, download the project source files and create an Eclipse project whose executable is main.native or main.byte. The files included in hw4.zip are briefly described below. Those marked with * are the only ones you should need to modify while completing this assignment.

Note: As with HW 3, you'll need Menhir for the parser. You'll also need to add the nums and unix libraries to compile this project. If you use OCamlBuild, you can compile the project from the command line by running ocamlbuild -Is x86,util -libs nums,unix -use-menhir main.native (or use the provided Makefile).

util/assert.ml(i) - the assertion framework
util/platform.ml  - platform-specific compilation support
util/range.ml(i)  - range datatype for error messages

ll/ll.ml          - the abstract syntax for LLVMlite
ll/lllib.ml       - name generation and pretty-printing for LLVMlite
ll/lllexer.mll    - lexer for LLVMlite syntax
ll/llparser.mly   - parser generator for LLVMlite syntax
ll/llinterp.ml    - reference interpreter for the LLVMlite semantics

/x86/x86.ml       - the X86lite language used as a target

README            - help about the main test harness
Makefile          - basic make support for invoking ocamlbuild

atprograms/*.ll   - example .oat v.1 programs used in testing

main.ml           - main test harness
driver.ml         - utilities for invoking the compiler
cinterop.c        - c code for testing interoperability
backend.ml        - sample solution to HW3 

ast.ml            - oat abstract syntax
astlib.ml         - pretty printing 
ctxt.ml           - typechecking context
*lexer.mll        - oat lexer
*parser.mly       - oat parser
*frontend.ml      - oat frontend

runtime.c         - oat runtime library

providedtests.ml  - where your own test cases should go
gradedtests.ml    - graded test cases that we provide  
progasts.ml       - helper ast representations for parser tests

Note: You should need to submit only lexer.mll, parser.mly, and frontend.ml (as well as posting a test case to Piazza) when completing this project.

Overview

Implement a compiler frontend for a simple C-like language that has boolean, int, string, and array types as well as top-level, mutually-recursive functions and global variables. Your compiler will accept source files of the form

int fact(int x) {
  int acc = 1;
  while (x > 0) {
    acc = acc * x;
    x = x - 1;
  }
  return acc;
}

int program(int argc, string[] argv) {
  print_string(string_of_int(fact(5)));
  return 0;
}

and will produce an executable (by default named a.out) that, when linked against runtime.c and then executed produces the resulting output:

  ./a.out
  120

Hint: For examples of Oat code, see the files in /atprograms, especially those with sensible names.

The Oat Language

The language accepted by your compiler is a simple variant of C-style programs. Oat supports multiple base-types of data: int, bool, and string, as well as arrays of such data. The Oat language is large enough that it is simpler to give the specification of its type system using inference rules than to use English prose. The Oat language specification contains a definition of the language syntax and a collection of inference rules that define Oat type checking.

See the file ast.ml for the OCaml representation of the abstract syntax -- the type typ of types is defined there, along with representations of expressions, statements, blocks, function declarations, etc. You should familiarize yourself with the correspondence between the OCaml representation and the notation used in the specification. The astlib module defines some helper functions for working with the abstract syntax.

Features

Functions:
Oat supports mutually-recursive, top-level functions. Each function body consisting of a series of imperative statements that have C-like semantics.

A complete Oat program contains a function called program with type (int, string[]) -> int, which takes command-line arguments like the C main function and is the entry-point of the executable.

Global values:
Oat supports top-level global constants of any type. Initialization of global constants happens statically (at compile time). Such global constants may not be mutually recursive.

Expression forms:
Oat supports several forms of expressions, including the usual binary and unary operations. Boolean values are true and false. Oat distinguishes boolean logical operations b1 & b2 (and) and b1 | b2 (or) from bit-wise int operations i1 [&] i2 (bitwise or) and i1 [|] i2 (bitwise or). (This difference from C is necessitated by the the lack of casts.)

Arrays:
Oat supports multi-dimensional arrays. The expression new typ [len]{ i => init } creates a new array of length len containing objects of type typ and initializes it by iterating the function over the array indices. For example, the following program creates a 3x4 array of integers and initializes them with the numbers 0 through 11:

  int[][] arr = new int[][3]{ i => new int[4]{ j => i*4 + j }};

Constant arrays are written using braces:

  int[] arr = {1, 2, 3, 4};

Oat v. 1 restricts constant arrays, like the one above, to mention only constant data values. The following program is not syntactically well formed because the array initializer contains non-constant expresions 2+0 and -4:

  int[] arr = {1, 2+0, 3, -4};

There is a distinction between constant arrays declared at the global scope and those declared locally to a function. Global constant arrays can be allocated at compile time, while those local to a function must be allocated dynamically. See the comments associated with cmp_const and cmp_init in frontend.ml.

Arrays are mutable, and they can be updated using assignment notation: vec[0] = 17. Array indices start at 0 and go through len - 1, where len is the length of the array. Oat arrays (unlike C arrays) also store a length field to support array-bounds checks, which we will add in a future project. However, for this project, you do not have to implement bounds checking.

Arrays in Oat are represented at the LL IR level by objects of the LL type {i64, [0 x t]}*, that is, an array is a pointer to a struct whose first element is an i64 value that is the array's length, and whose second component is a sequence of elements of type t. See the translation of Oat types into LL types via the cmp_typ function.

The length information is located before the 0th element of the array. For example, the constant array arr would be represented as a pointer to memory as shown below:

  int[] arr = {1,2,3,4};
  
  arr --+
        |
        v
       [4][1][2][3][4]

Path Expressions: As usual, local and global identifiers can be mentioned by name in expressions, and, since they denote mutable locations, they can also appear to the left of an assignment operation. In the example below, the identifier x appears in both ways:

  x = x + 1;

Function call expressions are written f(x,y,z) as in C, and they may nest, as in f(g(), 3). Array indexing operations, identifier references, and function invocations can be chained to form paths, so long as the intermediate results are well-typed. For example the code below shows that it is legal to index off of a function call expression, as long as the function returns an array.

int[] f(int[] x, int[] y, bool b) {
  if ( b ) {
    return x;
  } else {
    return y;
  }
}

int[] x = {1, 2, 3};
int[] y = {4, 5, 6};


int program (int argc, string[] argv) {
  f(x, y, true)[0] = 17;     /* non-trivial lhs path */
  int z = f(x, y, true)[0] + f(y, x, false)[0];  /* non-trivial expression paths */
  return z;  /* returns the value 34 */
}

Paths denote either references (a.k.a. "left-hand sides") to which new values can be stored, as in the non-trivial path above, or values that can appear nested in arbitrary expression computations, as in the definition of z above.

In anticipation of future projects, the abstract syntax tree uses the word "field" to mean global and local identifiers like x.

Strings:
Oat supports C-style immutable strings, written "in quotes". For the moment, the string operations are very limited and mostly provided by built-in functions provided by the runtime system.

String constants must be hoisted to the global scope.

Built-in Functions:
We now have enough infrastructure to support interesting built-in operations, including:

string_of_array : (int[]) -> string

int

array_of_string : (string) -> int[]
print_string : (string) -> unit
print_int : (int) -> unit
print_bool : (bool) -> unit

These built-in operations, along with some internal C-functions used by the Oat runtime are implemented in runtime.c.

Task I: Lexing and Parsing

The first step in implementing the frontend of the compiler is to get the lexer and parser working. We have provided you with a partial implementation in the lexer.mll and parser.mly files, but the provided grammar is both ambiguous and missing syntax for several key language constructs. Complete this implementation so that your frontend can parse all of the example atrprograms/*.oat files.

The full grammar is given in the Oat v.1 Language Specification.

You need to:

Add the appropriate token definitions to parser.mly and adjust the lexer.ml file appropriately.
Complete the parser according to the full grammar.
Disambiguate any parse conflicts (shift/reduce or reduce/reduce) according to the precedence and associativity rules.

Missing constructs include:

all of the binary operations except +, -, *, and ==
the boolean type and values
the array type and new array expression
for loops

Besides the parsing tests provided in gradedtests.ml, you can also test just the parsing behavior of your frontend by stopping compilation and pretty-printing the Oat code to the terminal:

    ./main.native -S --print-oat file.oat

Because the entire rest of the project hinges on getting a correct parser up and running, please try to do this early and seek help if you need it.

Task II: Frontend Compilation

The bulk of this project is implemeting the compiler in frontend.ml. The comments in that file will help you, but here is how we suggest you proceed:

Skim through the whole frontend.ml file to get a sense of its structure. It is arranged so that it mirrors the type system described in the Oat v.1 Language Specification.
There is one compilation function for each type of inference rule. The inputs to these functions are the context and the piece of syntax (and its type) to be compiled. The output of such a function depends on which part of the program you are compiling: expressions evaluate to values, so their compilation function returns the code computing an operand; statements do not evaluate to values, but they do introduce new local variables, so their compilation function returns a new context and an instruction stream. Once you get the hang of it, transliterating the inference rules from "math" into OCaml should be fairly straightforward. In particular, think about the invariants that are maintained by each function.
Take a close look at ctxt.ml to see how it represents the compilation contexts.
Begin by working on cmp_global_ctxt and cmp_init, though initially you can leave out arrays.
Next work on cmp_fdecl and cmp_fdecls. The latter just collects together a sequence of outputs resulting from calling cmp_fdecl on each function declaration of the program.
Next handle the Ret case of cmp_stmt. At this point, you should be able to generate plausible top-level functions of void type.
Next implement the Const, Bop, and Uop cases of cmp_exp. Again, saving arrays for later. At this point, you should be able to compile a program like atprograms/easyrun1.oat.
Add support for the Decl statement.
Now begin working on the path compilation, handling first Field (in cmp_path_lhs) and then Call (in cmp_path_exp). At this point, many more tests should start succeeding, because you can handle variables and function calls.
Add more statements. The If and While statements are very similar to what we've seen in the class demonstrations of Oat v.0. You can do For in several ways, but one easy way is to translate it at the abstract syntax level to a block of code that uses a while loop. The SCall statement isn't that different from the expression form; you might want to share code for compiling the function arguments.
Revisit the whole compiler adding support for arrays, following the same order as above. Because iterating through the array indices involves a loop, generating all of the LL code for the new expression directly is somewhat involved. One possible strategy is create LL code that directly implements the loop structure. This code will be very similar to that for a while statement.
Another possible strategy is to invoke the cmp_stmt function on a carefully crafted piece of abstract syntax that correctly implements the loop. The statement should refer to a variable for the loop index and the array itself, but that data must be initialized by LL code. This is a slightly tricky instance of metaprogramming.

Note: Although we have given you only the skeleton of the frontend.ml file, much of the code is similar (if not identical to) that demonstrated in lecture. See the sample code there for additional guidance.

Testing and Debugging Strategies

The test harness provided by main.ml gives several ways to assess your code. See the README file for a full description of the flags.

For this project, you will find it particularly helpful to run the LLVMlite code by compiling it via clang (with the --clang flag). That is because our backend implementation from HW3 (which we have provided for your reference) does not typecheck the .ll code that it compiles. Using clang will help you catch errors in the generated ll output.

Graded Test Cases

As part of this project, you must post an interesting test case for the compiler to the course Piazza site. This test case must take the form of a .oat file along with expected input arguments and outputs (as found in the hard tests of gradedtests.ml).

The test case you submit to Piazza will not count if it is too similar to previously-posted tests! Your test should be distinct from prior test cases. (Note that this policy encourages you to submit test cases early!)

You test should be an Oat program about the size of those in the hard test cases categories. Tests that stress parts of the language that aren't well exercised by the provided tests are particularly encouraged.

We will validate these tests against our own implementation of the compiler (and clang). A second component of your grade will be determined by how your compiler fairs against the test cases submitted by the other groups in the class.

Grading

Projects that do not compile will receive no credit!

Your team's grade for this project will be based on:

90 Points: the various automated tests that we provide.
5 Points for posting an interesting test case to Piazza. (Graded manually.)
5 Points divided among the test cases created by other groups. (Graded manually.)

Last modified: Sat Mar 28 19:18:33 EDT 2015

CIS 341 - Compilers Spring 2013 HW4: Oat v.1 Compiler