Due: 8 April 2013 at 11:59:59pm

Download: project5.zip

Oat language specification (Project 5 version)

To Submit: Go Here

This is a group project. Teams of size 1 or 2 are acceptable. If you are working in a team of two, only one student needs to submit the project. Be sure to fill in the team.txt file.

Warning: This project may be time consuming (though hopefully not so much as Project 3). Start early and contact the course staff with any questions. If you find yourself stuck or writing tons and tons of code, please ask for help on Piazza.
Note: You should need to modify only tc.ml and phase1.ml when completing this project.

Overview

Implement the interesting parts of an OO-language with single inheritance and dynamic dispatch. As discussed in class, the OAT type system distinguishes possibly-null references from definitely not-null references. Your compiler will accept source files of the form
  class A <: Object {
  int x;

    new (int y)() 
      this.x = y;	 {
    }

    unit print() {
      print_string(string_cat("A: x=", string_of_int(this.x)));
      return;
    }

  };

  A a = new A(1 + 2);

  int program (int argc, string[] argv) {
    a.print();
    return 0;
  }
  
and will produce an executable (by default named a.out) that, when linked against runtime.c and then executed produces the resulting output:
  ./a.out
  A: x=30                 // the extra '0' is the int returned by 'program'
  

The main challenge of this project is implementing the dynamic dispatch, class-tables, and dynamically-checked type cast operations. As a warm up, you will first complete some parts of the typechecker.

The Oat Language

The language accepted by your compiler is a simple Java-like language that supports all of the types previously supported, along with class types and possibly-null types. OAT has a null value, but the typechecker prevents null-pointer dereferences by forcing programmers to explicitly check whether a possibly-null pointer is actually null using the if? statement. As in the previous project, the only kind of error that can possibly be generated by a well-typed OAT program is an array-index out-of-bounds error.

At this point, the Oat language is large enough that it is simpler to give the specification using inference rules than to use English prose. The Oat language specification contains a definition of the full language syntax and a collection of inference rules that define Oat type checking.

See the file ast.mli for the OCaml representation of the abstract syntax -- the type typ of types is defined there, along with representations of expressions, statements, blocks, function declarations, etc. You should familiarize yourself with the correspondence between the OCaml representation and the notation used in the specification. The astlib module defines some helper functions for working with the abstract syntax. (The ast.ml(i) files are automatically generated by the same tool that we use to create the specification inference rules, which is why they aren't well formatted.)

New Features

Classes and Object:
OAT now supports class declarations with single inheritance. Classes can contain fields and methods, and they are required to have one constructor. For examples of the syntax, see the .oat files in the /tests and /lib project directories.

OAT supports standard dynamic dispatch -- see the code in tests/dynamic.oat for an example.

New expression forms:
OAT supports several new forms of expressions to allow field projection and method invocation from a "path" of function/method calls, array indices, and object fields. For example, you can write o.m(3).foo[7].x. OAT also supports this and super with semantics analogous to that in Java.

Possibly-Null References:
Reference types in OAT are forced to be non-null. To allow for null pointers, OAT has types of the form C?, which is a reference to either an object of type C or to null. Null checks are written using the if? statement, for example:

  
  C? nc = ...;       // nc is a possibly null pointer to a C object
  //  nc.m();        // this method invocation fails to typecheck
                     // since nc might be null
  if? (C c = nc) {
     c.m();          // this method invocation succeeds
  }
  
The file lib/list.oat uses possibly-null references extensively.

Dynamically-checked Casts:
OAT supports dynamically-checked downcasts, which allows the programmer to refine the type of an object in a safe way. The structure is similar to if?, for example:

  Object o = new C();
  cast (C c = o) {     // check whether o has type C, if so name it c
    c.m();             // here c has type C
  }
  

Externs, Libraries, Etc.:
OAT programs are now prepocessed using cpp to allow one .oat file to include other .oat code using the notation #include foo.oat. OAT programs may now also include top-level extern declarations, which should give OAT types to C functions found in library code. (Currently OAT arrays are not compatible with C arrays, so some care must be taken with such inerfaces.) See the example of math.oat found in the /lib directory -- its extern functions are provided by runtime.c.

Running the Project

This project uses the build configuration of Project 4. The main executable also supports some addition command-line flags.

The test suite of this project expects to be able to load test files (named test1.oat, test2.oat, etc.) from a tests directory. By default, the path to the tests directory is set to tests/, under OcaIDE and on Windows, you will need to configure the tests directory by giving main the -test-path path argument. For example:

Unlike previous projects, we now include a lib directory, which is where some useful .oat code resides. You should use the -I flag to point main to lib. In OcaIDE the main is in _build, so you need to specify the path of runtime.c by the -runtime flag before the --test in the ``Run Configurations...'' dialog.

Some newly supported command line arguments are: -I dir/ to add dir/ to the search path of the compiler, and -lib lib to add a library to the linked files. As usual, do main.native --help from the command line for a synopsis.
If you want to link against C libraries (e.g. the ncurses library needed to compile console.oat), you need to add -l ncurses (or similar) to the compiler arguments.

Working with the compiler from the command line.

By default, the main program accepts the name of an Oat program, for example file.oat, compiles it, links it with the specified runtime.c program, and produces an executable called a.out. The default names for runtime.c and a.out can be modified using the command-line switches shown above.

As a side effect, the compiler also generates c_obj/file.ll and c_obj/file.s outputs, which contain the program generated after completing Phases 1 (to LL IR) and 2 (to x86 assembly), respectively. The c_bin directory is used to hold the executables and outputs generated during automatic testing using the --test flag. The -clean option deletes all the files from the c_bin and c_obj directories -- this is useful because you may generate many hundreds of temporary files during testing.

Testing Tip: use the --llvm-backend option to test Phase1 independently of Phase2. This option assumes that you have a working installation of the llc compiler in your PATH. llc is installed on Eniac, and is available for other platforms from http://llvm.org

OCaml Libraries

On all platforms you need to add both unix and str to the list of OCaml libraries used for this project. You can set this by specifying them on the command line like: ocamlbuild -lib unix -lib str, or in Eclipse under the project's Properties pane:
  1. Open the Project » Properties dialog
  2. Select the Project pane (choose it from the left column)
  3. In the Libraries (-libs) box add unix,str

Windows

This project depends on platform-specific configuration data to assemble and link the X86 code your compiler generates. On Windows we assume that you are using Cygwin and the gcc4-g++ package for the gcc-4 compiler.
  1. Run Cygwin setup.exe and select one more package.
  2. After search 'gcc', click on '+' to extend the categories tree: the root node is "All", and the child node is "Devel". Then on the package ``gcc4-g++:G++ subpackage'', click 'skip' to change it to be 4.3.4-3. This will also install all dependent packages.
If these are installed properly, the compiler should be fine, but if you run into problems on Windows, ask the TA for help.

To turn off the following warning, under system variables, click on New, set variable name to CYGWIN and set value as nodosfilewarning and you're done.

    cygwin warning:
MS-DOS style path detected: ...
Perfered POSIX equivalent is: ...
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths;
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames

Directories and Paths

Your compiler will produce X86 .s and .o files as well as native executables and .out files used during testing. By default these files will be generated in directories c_obj and c_bin respectively, and the default executable is called a.out. The main compiler accepts new command-line flags to change these directories from the default. Likewise, the -o option tells the compiler to use the given name for the result executable.

Note: Before running the compiler, you should create the c_bin and c_obj directories as sub-directories of your project. In OcaIDE you need to create the two directories under _build. If running the compiler issues an error like "can't open output file for writing", it probably means you are missing these directories.

Running on Linux

If you are using some flavor of linux (for example on Eniac) but not Mac OS X, you should specify the -linux flag when running the compiler. This turns off name mangling, which has different conventions on linux vs. Mac or Windows. In Eclipse you should add -linux before the --test flag.

Task I: Type checking

We have provided you with a nearly complete front-end (lexer/parser/type checker) for full OAT. For this part of the assignment, you will implement the remaining rules in tc.ml as a way to familiarize yourself with the parts of the language specification related to objects. Like in Project 4, the context data-structure is provided, but now includes the Σ and cidopt components of the context, represented by the sigs and this fields, respectively. The first missing part of the typechecker is the subtype, sub-reference, and sub-class judgements from page 6 of the language spec. These should be implemented as three functions: subclass, subref, and subtype. Their signatures are provided near the top of tc.ml Your typechecker should now work for many of the provided test cases.

To complete the typechecker, you'll need to fill in the missing cast case of tc_stmt.

We have also omitted the implementation of typechecking for the Cast statement. Complete the definition of this case.

Hints

Task II: Compilation - Phase 1

The provided phase1.ml code can already compile the subset of Oat not including class declarations and method calls. In this part of the project, you will complete the implementation, focusing on the new operations.

At the IR level, objects and vtables are both represented by structs. Objects are heap-allocated using the oat_alloc_object function which, like the dynamic array allocation from Project 3, requests an uninitialized piece of memory from the runtime and bitcasts the resulting pointer to the appropriate Oat type. An object constructor initializes the fields of this struct and reserves the first field for the vtable pointer. Vtables are simply global structs of function pointers, where the first field is a pointer to the vtable of the super class. The code that generates vtables is in the cmp_cctxt function in phase1.ml.

The target language is an enhanced version of the IR used in Project 4. It is a subset of the IR used by the LLVM compiler toolchain, which is documented extensively at the LLVM Language Reference web site.

The files ll.mli and ll.ml provide the datatypes for working with the Ll IR abstract syntax. The lllib.mli library provides some useful functions for creating unique identifiers and printing IR objects. Compared with the version of Ll used in Project 4, the main new features of the Ll IR are:

The provided code includes a translation context for Phase 1 in ctxt.ml. This is the translation context from Project 4 extended with the fields necessary to compile classes. The most important of these are this, which contains the current class identifier, and csigs, a table containing the following translation info for each class:

  type csig = { 
    ext: string option;              (* Super class id *)
    vtable: Ll.operand;              (* Vtable pointer operand *)
    ctor: Ll.fn;                     (* Constructor function signature *)
    fields: (string * Ll.ty) list;   (* Source field id -> type map *)
    methods: (string * Ll.fn) list;  (* Source method id -> sig map *)
  }
  
This includes a Ll pointer to the global vtable, the Ll function signature of the constructor, and method and field tables for the class. Check ctxt.ml for a description of the additional operations provided on the ctxt datatype.

The code that gathers this context information is provided for you in the cmp_cctxt function. It is executed before any function. method bodies, or constructor bodies are compiled, so you can depend on the global context being available and correct. You should study cmp_cctxt to make sure you understand how the field and method tables are created and what the resulting Ll object and vtable structs look like.

In this part of the assignment, you will need to complete Phase 1 by implementing functions to compile:

Path Expressions

In Oat, a path expression can be used as a left hand side to update the value of an object's field, as an expression to access the field of an object, or to invoke a method on an object. A path consists of either the keyword 'this', a left-hand-side, or a call that returns an object reference followed by the identifier of a field or method. To compile a path expression, you need to:
  1. Compile the base of the path
  2. Look up information about the base object's static type in the context context.
  3. If the path identifier corresponds to a field of the object:
    1. Look up the index and type of the field in the object struct in the context.
    2. Generate code to calculate the pointer to the appropriate field of the object struct.
  4. If the path identifier corresponds to a method of the base object:
    1. Find the index and signature of the method in the object's vtable.
    2. Look up the Ll type of the base object's vtable pointer in the context, then generate code to:.
    3. Calculate the pointer to the base object's vtable.
    4. Dereference the vtable pointer of the base object.
    5. Calculate a pointer to the index of the desired method
    6. Dereference the resulting pointer to get the function pointer to the desired method.

Complete the cmp_path in phase1.ml. See the comments above the function signature for more details and the last section of the project description for tips on how to get started.

Hints

Constructors

A constructor is just a method that takes a this pointer to a freshly allocated and uninitialized object in memory. The constructor initializes the object by first calling the super class constructor, then initializing fields of the object, sets the vtable pointer, and finally evaluates the constructor body with 'this' in scope. More specifically, your generated code should:
  1. Compile the argument list, adding each argument to the local variable context.
  2. Compile each expression in 'es', using the context extended by the constructor arguments.
  3. Generate code to bitcast each operand representing the value of 'es' to the types of the arguments required by the super class of cid.
  4. Generate code to call the superclass constructor using a pointer to the object being initialzid and 'es' as arguments.
  5. Extend the 'is', the initializer list to set the field _name to the (string) cid of the current class.
  6. Compile the initializer list by calling cmp_cinits. For each (field_id, init) pair:
    1. Compile the path 'This.field_id' and init.
    2. Genearte code to store the value of init in the address computed by the code for 'This.field_id'.
  7. Generate code to update the vtable pointer to this class' vtable pointer.
  8. Return the 'this' pointer passed as the first argument to the compiled function.

Complete the implementation of the functions cmp_ctor and cmp_cinits in phase1.ml.

Hints

Path and Super Methods

In order to call shadowed methods, Oat needs a mechanism to explicitly refer to methods of super class from inside a method. Complete the SuperMethod case of cmp_call.

The last step needed to implement dynamic dispatch is to actually call the function pointer returned by cmp_path. Complete the PathMethod case of cmp_call

At this point, many of the test cases for the full compiler should be passing, and your compiler should work with any valid Oat program that doesn't use the cast statement.

Hints

Cast

Implement the Cast statement. This is the dynamic counterpart to the subclass function your implemented in the typechecker. As such, the code you generate must dynamically search up the class heirarchy to test whether the given object's type is a subtype of the one mentioned statically in the Cast statement. This search is essentially a while loop that does pointer dereferences to load the type information from the object representations. Recall that the run-time representation of an object's class is its vtable (dispatch label) pointer.

The hardest part about implementing cast is satisfying the llvm type system. Since we only care about comparing the addresses of the vtables and only need to access the first pointer, one approach is to treat vtables as just their addresses: in the loop that traverses the chain of vtable pointers, bitcast the vtable pointer to (Ptr (Ptr I8)) prior to loading from the address, then compare the resulting (Ptr I8) against the address of the subclass' vtable pointer, and against the operand (Ptr I8, Null) for the loop guard.

Hints

Getting Started

Although the compiler is complete for the subset of Oat not including any class declarations, the global context always includes Object, so initially, compiling will always fail.

One way to get started writing cmp_path and cmp_ctor is to replace the stub code (failwith ... ) with code that doesn't produce any useful Ll, but lets compilation complete.

For cmp_ctor, returning build_fdecl c (lookup_csig c cid).ctor [] [] should do the trick. For cmp_path, you can simply return ((Ptr I8, Null), (Ptr I8, Null), []).

This should allow you to compile min.oat in the tests directory, and inspect the code produced for the top-level environment. In c_obj/min.ll, you should see:

    ;; the named types generated for the Object class and its vtable:
    ;;
    %Object = type { %_Object_vtable*, i8* } 
    %_Object_vtable = type { {  }*, i8* (%Object*)* }
    
    ...

    ;; the Object.get_name() method (with garbage produced by cmp_path):
    ;;
    define i8* @_Object_get_name (%Object* %_this1){
    __fresh0:
      %lhs_or_call2 = load i8* null
      %cast_op3 = bitcast i8 %lhs_or_call2 to i8* 
      ret i8* %cast_op3
    }

    ...

    ;; the empty Object constructor
    ;; 
    define %Object* @_Object_ctor (){}

  

Now you can examine the bitcode produced by cmp_path before completing cmp_ctor. You may find it easier to debug your code by editing the generated ll files and compilng them with llc, then going back and making the corresponding changes in your compiler.

Grading

Your project grade will be based on the test cases in gradedtests.ml. These are divided up into categories with point values as follows:
Last modified: Mon Mar 25 19:17:26 EDT 2013