Due: 6 April 2015 at 11:59:59pm

Download: hw5.zip

Oat v.1 Language Specification

To Submit: Go Here

This is a group project. Teams of size 1 or 2 are acceptable. If you are working in a team of two, only one student needs to submit the project.

Warning: This project may will be time consuming. Start early and contact the course staff with any questions. If you find yourself stuck or writing tons and tons of code, please ask for help on Piazza or in office hours.

Getting Started

To get started, download the project source files and create an Eclipse project whose executable is main.native or main.byte. The files included in hw5.zip are briefly described below. Those marked with * are the only ones you should need to modify while completing this assignment.
Note:The compilation setup is the same as in HW4. You'll need Menhir for the parser. You'll also need to add the nums and unix libraries to compile this project. If you use OCamlBuild, you can compile the project from the command line by running ocamlbuild -Is x86,util -libs nums,unix -use-menhir main.native (or use the provided Makefile).
util/assert.ml(i) - the assertion framework
util/platform.ml  - platform-specific compilation support
util/range.ml(i)  - range datatype for error messages

ll/ll.ml          - the abstract syntax for LLVMlite
ll/lllib.ml       - name generation and pretty-printing for LLVMlite
ll/lllexer.mll    - lexer for LLVMlite syntax
ll/llparser.mly   - parser generator for LLVMlite syntax
ll/llinterp.ml    - reference interpreter for the LLVMlite semantics

/x86/x86.ml       - the X86lite language used as a target

README            - help about the main test harness
Makefile          - basic make support for invoking ocamlbuild

oatprograms/*.ll  - example .oat programs used in testing

main.ml           - main test harness
driver.ml         - utilities for invoking the compiler
backend.ml        - sample solution to HW3

ast.ml            - oat abstract syntax
astlib.ml         - pretty printing 
ctxt.ml           - compilation context
lexer.mll         - oat lexer
parser.mly        - oat parser
tc.ml             - typechecker
*tctxt.ml         - typechecking context and subtyping implementation
*frontend.ml      - oat frontend

runtime.c         - oat runtime library

providedtests.ml  - where your own test cases should go
gradedtests.ml    - graded test cases that we provide  
Note: You should need to submit only frontend.ml and tctxt.ml (as well as posting a test case to Piazza) when completing this project.

Overview

Implement the interesting parts of an OO-language with single inheritance and dynamic dispatch. As discussed in class, the Oat type system distinguishes possibly-null references from definitely not-null references. Your compiler will accept source files of the form below (see
class A <: Object {
  new (int y)() { 
    int x = y; 
  }
  
  void print() {
    print_string(string_cat(string_cat("CIS", string_of_int(x)), " ROCKS"));
    return;
  }
}

class B <: A {
  new ()(120) {}

  void print() {
    super.print();
    print_string(" too!");
    return;
  }
}

int program (int argc, string[] argv) {
  A? a = null;
  a = new A(341);
  A b = new B();
  if? (A x = a) { x.print(); }
  print_string("; ");  b.print();
  return 0;
}
  
and will produce an executable (by default named a.out) that, when linked against runtime.c and then executed produces the resulting output:
  ./a.out
  CIS341 ROCKS; CIS120 ROCKS too!
  

The main challenge of this project is implementing the dynamic dispatch, class-tables, and dynamically-checked type cast operations. As a warm up, you will first complete some parts of the typechecker.

The Oat Language

The language accepted by your compiler is a simple Java-like language that supports all of the types previously supported, along with class types and possibly-null types. Oat has a null value, but the typechecker prevents null-pointer dereferences by forcing programmers to explicitly check whether a possibly-null pointer is actually null using the if? statement.

At this point, the Oat language is large enough that it is simpler to give the specification using inference rules than to use English prose. The Oat language specification contains a definition of the the full language syntax and a collection of inference rules that define Oat type checking. (Though it omits some of the definitions for the sake of brevity; the code in tc.ml implements the specification completely.)

See the file ast.mli for the OCaml representation of the abstract syntax -- the type typ of types is defined there, along with representations of expressions, statements, blocks, function declarations, etc. You should familiarize yourself with the correspondence between the OCaml representation and the notation used in the specification. The Astlib module defines some helper functions for working with the abstract syntax.

Note: Unlike HW4, the abstract syntax representation for this project is parameterized by a type, which defines some "extra" metadata about the underlying syntax tree. The parser produces values of type unit exp, etc., and the typechecker converts those values to t exp, where t is an OCaml representation of the type.

New Features

Classes and Object:
Oat now supports class declarations with single inheritance. Classes can contain fields and methods, and they are required to have one constructor. For examples of the syntax, see the .oat files in the /oatprogram project directory.

The fields of a class are define inside the constructor block. In the example code from above, the class A defines the field x in its constructor (initializing it to y).

Oat supports standard dynamic dispatch -- see the code in tests/dynamic.oat for an example.

New expression forms:
Oat supports several new forms of expressions to allow field projection and method invocation from a "path" of function/method calls, array indices, and object fields. For example, you can write o.m(3).foo[7].x. Oat also supports this and super with semantics analogous to that in Java.

Possibly-Null References:
Reference types in Oat are forced to be non-null. To allow for null pointers, Oat has types of the form C?, which is a reference to either an object of type C or to null. Null checks are written using the if? statement, for example:

  
  C? nc = ...;       // nc is a possibly null pointer to a C object
  //  nc.m();        // this method invocation fails to typecheck
                     // since nc might be null
  if? (C c = nc) {
     c.m();          // this method invocation succeeds
  }
  
The example program illustrates how the null check can be used in practice.

Dynamically-checked Casts:
Oat also supports dynamically-checked downcasts, which allows the programmer to refine the type of an object in a safe way. Such downcasts use the same notation as the null checks, for example:

  Object o = new C();
  if? (C c = o) {     // check whether o has type C, if so name it c
    c.m();            // here c has type C
  }
  

Task I: Type checking

We have provided you with a nearly complete front-end (lexer/parser/type checker) for full Oat. For this part of the assignment, you will implement a small part of the type system in tctxt.ml as a way to familiarize yourself with the parts of the language specification related to objects.

The missing part of the typechecker is the join_typ operation on types, which is derivable from the subtyping rules in the Oat specification.

Hints

Task II: Frontend Compilation: Constructors and NewObj

The provided frontend.ml code can already compile the subset of Oat not including class declarations and method calls. In this part of the project, you will complete the implementation, focusing on the new operations.

At the IR level, objects and vtables are both represented by structs. Objects are heap-allocated using the oat_malloc function which, like the dynamic array allocation from HW4, requests an uninitialized piece of memory from the runtime.An object constructor initializes the fields of this struct and reserves the first field for the vtable pointer. Vtables are simply global structs of function pointers, where the first field is a pointer to the vtable of the super class. The code that generates vtables is in the cmp_hierarchy function in frontend.ml.

The amount of code that you have to write for this project isn't very large, but the challenge is understanding all of the types and invariants being manipulated by the compiler. In particular, understanding how and when to use the LLVM bitcast instruction is crucial.

Constructors

A constructor is just a global function that takes a this pointer to a freshly allocated and uninitialized object in memory. The constructor initializes the object by first calling the super class constructor, then initializing fields of the object, and setting the vtable pointer.

Complete the implementation of the functions cmp_ctr in frontend.ml.

Hints

New

The counterpart to the constructor is the new keyword, which invokes the constructor to create a new object. Next, complete the NewObj case of cmp_exp.

Path Expressions: Method Invocation

In Oat, a path expression can be used as a left hand side to update the value of an object's field, as an expression to access the field of an object, or to invoke a method on an object. A path consists of either the keyword 'super', a left-hand-side, or a call that returns an object reference followed by the identifier of a field or method. Here, we first complete the case of dynamic dispatch.

Complete the dynamic dispatch case of the cmp_call function in frontend.ml. See the comments there for more help.

Hints

Super Methods

In order to call shadowed methods, Oat needs a mechanism to explicitly refer to methods of super class from inside a method. Complete the super case of cmp_call. At this point, many of the test cases for the full compiler should be passing, and your compiler should work with any valid Oat program that doesn't use the if? statement.

Dynamic Cast

Implement the Cast statement. This is the dynamic counterpart to the subtype function in the typechecker. As such, the code you generate must dynamically search up the class heirarchy to test whether the given object's type is a subtype of the one mentioned statically in the Cast statement. This search is essentially a while loop that does pointer dereferences to load the type information from the object representations. Recall that the run-time representation of an object's class is its vtable (dispatch label) pointer.

The hardest part about implementing cast is satisfying the llvm type system. Since we only care about comparing the addresses of the vtables and only need to access the first pointer, one approach is to treat vtables as just their addresses: in the loop that traverses the chain of vtable pointers, bitcast the vtable pointer to (Ptr (Ptr I8)) prior to loading from the address, then compare the resulting (Ptr I8) against the address of the subclass' vtable pointer, and against the operand (Ptr I8, Null) for the loop guard.

Note that Oat also folds in null-checks as part of the dynamic type tests. (That is because null is a type.) The comments in the Cast case of cmp_stmt suggest the case analysis you need to do.

Hints

Testing and Debugging Strategies

The test harness provided by main.ml gives several ways to assess your code. See the README file for a full description of the flags.

As in HW4, for this project, you will find it particularly helpful to run the LLVMlite code by compiling it via clang (with the --clang flag). That is because our backend implementation from HW3 (which we have provided for your reference) does not typecheck the .ll code that it compiles. Using clang will help you catch errors in the generated ll output.

Graded Test Cases

As part of this project, you must post an interesting test case for the compiler to the course Piazza site. This test case must take the form of a .oat file along with expected input arguments and outputs (as found in the hard tests of gradedtests.ml).

The test case you submit to Piazza will not count if it is too similar to previously-posted tests! Your test should be distinct from prior test cases. (Note that this policy encourages you to submit test cases early!)

Your test should be an significant Oat program of at least 50 lines on code. The test you submit should exercise as many of the following oat features as your compiler can handle:
  • Class definitions.
  • Constructors.
  • Dynamic dispatch through inherited methods.
  • Dynamic dispatch through overriden methods.
  • Dynamically-checked downcasts.

We will validate these tests against our own implementation of the compiler (and clang). A second component of your grade will be determined by how your compiler fairs against the test cases submitted by the other groups in the class.

Grading

Projects that do not compile will receive no credit!

Your team's grade for this project will be based on:

Last modified: Sat Mar 28 20:56:08 EDT 2015