Due: 8 April 2013 at 11:59:59pm
Download: project5.zip
Oat language specification (Project 5 version)
To Submit: Go Here
This is a group project. Teams of size 1 or 2 are acceptable. If you are working in a team of two, only one student needs to submit the project. Be sure to fill in the team.txt file.
Overview
Implement the interesting parts of an OO-language with single inheritance and dynamic dispatch. As discussed in class, the OAT type system distinguishes possibly-null references from definitely not-null references. Your compiler will accept source files of the formclass A <: Object { int x; new (int y)() this.x = y; { } unit print() { print_string(string_cat("A: x=", string_of_int(this.x))); return; } }; A a = new A(1 + 2); int program (int argc, string[] argv) { a.print(); return 0; }and will produce an executable (by default named a.out) that, when linked against runtime.c and then executed produces the resulting output:
./a.out A: x=30 // the extra '0' is the int returned by 'program'
The main challenge of this project is implementing the dynamic dispatch, class-tables, and dynamically-checked type cast operations. As a warm up, you will first complete some parts of the typechecker.
The Oat Language
The language accepted by your compiler is a simple Java-like language that supports all of the types previously supported, along with class types and possibly-null types. OAT has a null value, but the typechecker prevents null-pointer dereferences by forcing programmers to explicitly check whether a possibly-null pointer is actually null using the if? statement. As in the previous project, the only kind of error that can possibly be generated by a well-typed OAT program is an array-index out-of-bounds error.At this point, the Oat language is large enough that it is simpler to give the specification using inference rules than to use English prose. The Oat language specification contains a definition of the full language syntax and a collection of inference rules that define Oat type checking.
See the file ast.mli for the OCaml representation of the abstract syntax -- the type typ of types is defined there, along with representations of expressions, statements, blocks, function declarations, etc. You should familiarize yourself with the correspondence between the OCaml representation and the notation used in the specification. The astlib module defines some helper functions for working with the abstract syntax. (The ast.ml(i) files are automatically generated by the same tool that we use to create the specification inference rules, which is why they aren't well formatted.)
New Features
Classes and Object:OAT now supports class declarations with single inheritance. Classes can contain fields and methods, and they are required to have one constructor. For examples of the syntax, see the .oat files in the /tests and /lib project directories.
OAT supports standard dynamic dispatch -- see the code in tests/dynamic.oat for an example.
New expression forms:
OAT supports several new forms of expressions to allow field projection and
method invocation from a "path" of function/method calls, array indices, and
object fields. For example, you can write o.m(3).foo[7].x. OAT also
supports this and super with semantics analogous to that in
Java.
Possibly-Null References:
Reference types in OAT are forced to be non-null. To allow for null pointers,
OAT has types of the form C?, which is a reference to either an
object of type C or to null. Null checks are written using
the if? statement, for example:
C? nc = ...; // nc is a possibly null pointer to a C object // nc.m(); // this method invocation fails to typecheck // since nc might be null if? (C c = nc) { c.m(); // this method invocation succeeds }The file lib/list.oat uses possibly-null references extensively.
Dynamically-checked Casts:
OAT supports dynamically-checked downcasts, which allows the programmer to
refine the type of an object in a safe way. The structure is similar
to if?, for example:
Object o = new C(); cast (C c = o) { // check whether o has type C, if so name it c c.m(); // here c has type C }
Externs, Libraries, Etc.:
OAT programs are now prepocessed using cpp to allow one .oat file to
include other .oat code using the notation #include foo.oat.
OAT programs may now also include top-level extern declarations,
which should give OAT types to C functions found in library
code. (Currently OAT arrays are not compatible with C arrays, so
some care must be taken with such inerfaces.) See the
example of math.oat found in the /lib directory -- its
extern functions are provided by runtime.c.
Running the Project
This project uses the build configuration of Project 4. The main executable also supports some addition command-line flags.The test suite of this project expects to be able to load test files (named test1.oat, test2.oat, etc.) from a tests directory. By default, the path to the tests directory is set to tests/, under OcaIDE and on Windows, you will need to configure the tests directory by giving main the -test-path path argument. For example:
- In Linux main.native -runtime ../runtime.c -test-path ../tests/ --test
- In Windows main.native -runtime ..\runtime.c -test-path ..\tests\ --test
Unlike previous projects, we now include a lib directory, which is where some useful .oat code resides. You should use the -I flag to point main to lib. In OcaIDE the main is in _build, so you need to specify the path of runtime.c by the -runtime flag before the --test in the ``Run Configurations...'' dialog.
- In Linux main.native -linux -runtime ../runtime.c --test
- In Windows main.native -runtime ..\runtime.c --test
Working with the compiler from the command line.
By default, the main program accepts the name of an Oat program, for example file.oat, compiles it, links it with the specified runtime.c program, and produces an executable called a.out. The default names for runtime.c and a.out can be modified using the command-line switches shown above.As a side effect, the compiler also generates c_obj/file.ll and c_obj/file.s outputs, which contain the program generated after completing Phases 1 (to LL IR) and 2 (to x86 assembly), respectively. The c_bin directory is used to hold the executables and outputs generated during automatic testing using the --test flag. The -clean option deletes all the files from the c_bin and c_obj directories -- this is useful because you may generate many hundreds of temporary files during testing.
OCaml Libraries
On all platforms you need to add both unix and str to the list of OCaml libraries used for this project. You can set this by specifying them on the command line like: ocamlbuild -lib unix -lib str, or in Eclipse under the project's Properties pane:- Open the Project » Properties dialog
- Select the Project pane (choose it from the left column)
- In the Libraries (-libs) box add unix,str
Windows
This project depends on platform-specific configuration data to assemble and link the X86 code your compiler generates. On Windows we assume that you are using Cygwin and the gcc4-g++ package for the gcc-4 compiler.- Run Cygwin setup.exe and select one more package.
- After search 'gcc', click on '+' to extend the categories tree: the root node is "All", and the child node is "Devel". Then on the package ``gcc4-g++:G++ subpackage'', click 'skip' to change it to be 4.3.4-3. This will also install all dependent packages.
To turn off the following warning, under system variables, click on New, set variable name to CYGWIN and set value as nodosfilewarning and you're done.
cygwin warning:
MS-DOS style path detected: ...
Perfered POSIX equivalent is: ...
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths;
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
Directories and Paths
Your compiler will produce X86 .s and .o files as well as native executables and .out files used during testing. By default these files will be generated in directories c_obj and c_bin respectively, and the default executable is called a.out. The main compiler accepts new command-line flags to change these directories from the default. Likewise, the -o option tells the compiler to use the given name for the result executable.
Running on Linux
If you are using some flavor of linux (for example on Eniac) but not Mac OS X, you should specify the -linux flag when running the compiler. This turns off name mangling, which has different conventions on linux vs. Mac or Windows. In Eclipse you should add -linux before the --test flag.
Task I: Type checking
We have provided you with a nearly complete front-end (lexer/parser/type checker) for full OAT. For this part of the assignment, you will implement the remaining rules in tc.ml as a way to familiarize yourself with the parts of the language specification related to objects. Like in Project 4, the context data-structure is provided, but now includes the Σ and cidopt components of the context, represented by the sigs and this fields, respectively. The first missing part of the typechecker is the subtype, sub-reference, and sub-class judgements from page 6 of the language spec. These should be implemented as three functions: subclass, subref, and subtype. Their signatures are provided near the top of tc.ml Your typechecker should now work for many of the provided test cases.
To complete the typechecker, you'll need to fill in the missing cast case of tc_stmt.
We have also omitted the implementation of typechecking for the Cast statement. Complete the definition of this case.
Hints
- Phase 1 depends on the invariants set up by the typechecker, so although this part of the assignment isn't much code, you'll need to spend some time understanding the spec.
- Like in Project 4, your implementation should signal a type error in an ill-formed Oat program by raising the TypeError exception. Use Failure exceptions (via failwith) to indicate broken invariants or impossible cases in the compiler implementation.
- tc.ml includes a few helper functions for reporting type errors. You may find these useful.
- Before you begin, make sure you understand how the class heirarchy is represented in ctxt by looking over tc_fctxt.
- The Cast statement is similar to the IfNull statement -- you should look at that implementation and the typing rules of the specification to determine how to complete this case.
Task II: Compilation - Phase 1
The provided phase1.ml code can already compile the subset of Oat not including class declarations and method calls. In this part of the project, you will complete the implementation, focusing on the new operations.
At the IR level, objects and vtables are both represented by structs. Objects are heap-allocated using the oat_alloc_object function which, like the dynamic array allocation from Project 3, requests an uninitialized piece of memory from the runtime and bitcasts the resulting pointer to the appropriate Oat type. An object constructor initializes the fields of this struct and reserves the first field for the vtable pointer. Vtables are simply global structs of function pointers, where the first field is a pointer to the vtable of the super class. The code that generates vtables is in the cmp_cctxt function in phase1.ml.
The target language is an enhanced version of the IR used in Project 4. It is a subset of the IR used by the LLVM compiler toolchain, which is documented extensively at the LLVM Language Reference web site.
The files ll.mli and ll.ml provide the datatypes for working with the Ll IR abstract syntax. The lllib.mli library provides some useful functions for creating unique identifiers and printing IR objects. Compared with the version of Ll used in Project 4, the main new features of the Ll IR are:
- Ll now includes the Null operand, which can have any pointer type. This is analogous to the null pointer in C, and compiles to the immediate $0 in phase 2.
- In order to support indirect calls, Ll operands can now be pointers to functions. The Fptr type takes a list of arguments and an optional return value, just like function signatures in Project 3.
- Function names are now global identifiers. Ie the name fields of Ll function signatures and definitions is a gid, which can be used as an Ll operand: (Fptr (ty_args, rty), Gid gid)
- The Call instruction takes an Ll operand (which must have Fptr type) instead of a function signature
- Global data can now include structs. These can be indexed using Gep exactly like the heap-allocated structs used to represent arrays in Project 3. The fields of a global struct must be initialized with either global identifiers or Null.
- In order to facilitate debugging and simplify compiling classes, Ll now supports type aliases. An Ll.prog contains an additional field, namedts containing a mapping of string identifiers to Ll types. In any Ll code you generate, the type (Namedt id) will be considered equivalent to the Ll.ty in the in the namedts entry corresponding to id.
The provided code includes a translation context for Phase 1 in ctxt.ml. This is the translation context from Project 4 extended with the fields necessary to compile classes. The most important of these are this, which contains the current class identifier, and csigs, a table containing the following translation info for each class:
type csig = { ext: string option; (* Super class id *) vtable: Ll.operand; (* Vtable pointer operand *) ctor: Ll.fn; (* Constructor function signature *) fields: (string * Ll.ty) list; (* Source field id -> type map *) methods: (string * Ll.fn) list; (* Source method id -> sig map *) }This includes a Ll pointer to the global vtable, the Ll function signature of the constructor, and method and field tables for the class. Check ctxt.ml for a description of the additional operations provided on the ctxt datatype.
The code that gathers this context information is provided for you in the cmp_cctxt function. It is executed before any function. method bodies, or constructor bodies are compiled, so you can depend on the global context being available and correct. You should study cmp_cctxt to make sure you understand how the field and method tables are created and what the resulting Ll object and vtable structs look like.
In this part of the assignment, you will need to complete Phase 1 by implementing functions to compile:- path expressions
- constructor function bodies
- method and super method calls
- the 'cast' statement
Path Expressions
In Oat, a path expression can be used as a left hand side to update the value of an object's field, as an expression to access the field of an object, or to invoke a method on an object. A path consists of either the keyword 'this', a left-hand-side, or a call that returns an object reference followed by the identifier of a field or method. To compile a path expression, you need to:- Compile the base of the path
- Look up information about the base object's static type in the context context.
- If the path identifier corresponds to a field of the object:
- Look up the index and type of the field in the object struct in the context.
- Generate code to calculate the pointer to the appropriate field of the object struct.
- If the path identifier corresponds to a method of the base object:
- Find the index and signature of the method in the object's vtable.
- Look up the Ll type of the base object's vtable pointer in the context, then generate code to:.
- Calculate the pointer to the base object's vtable.
- Dereference the vtable pointer of the base object.
- Calculate a pointer to the index of the desired method
- Dereference the resulting pointer to get the function pointer to the desired method.
Complete the cmp_path in phase1.ml. See the comments above the function signature for more details and the last section of the project description for tips on how to get started.
Hints
- You may find it useful for other parts of the project to write helper methods for finding the index and type information of methods and fields from the context
- Remember that the first field of an object struct contains the object's vtable pointer, and not the first field of the object.
- Remember that the first field of a vtable struct contains a pointer to the super class' vtable, and not the first method.
- You may find the this_op helper function useful. Check the comments in phase1.ml
- Make sure your implementation fails with a detailed error message if it detects that any invariants guaranteed by type checking are violated.
- All of the Ll. typing information you need for intermediate Ll operands in your generated code is either found from the result of compiling the base pointer, or in the context.
- Our implementation was only ~30 loc. You might might not be on the right track if your code is significantly longer.
- The IR code you need to generate for this function is quite short. The case for fields in particular can be done with a single Gep instruction.
- The return value of the function is described in the comments.
Constructors
A constructor is just a method that takes a this pointer to a freshly allocated and uninitialized object in memory. The constructor initializes the object by first calling the super class constructor, then initializing fields of the object, sets the vtable pointer, and finally evaluates the constructor body with 'this' in scope. More specifically, your generated code should:- Compile the argument list, adding each argument to the local variable context.
- Compile each expression in 'es', using the context extended by the constructor arguments.
- Generate code to bitcast each operand representing the value of 'es' to the types of the arguments required by the super class of cid.
- Generate code to call the superclass constructor using a pointer to the object being initialzid and 'es' as arguments.
- Extend the 'is', the initializer list to set the field _name to the (string) cid of the current class.
- Compile the initializer list by calling cmp_cinits. For each
(field_id, init) pair:
- Compile the path 'This.field_id' and init.
- Genearte code to store the value of init in the address computed by the code for 'This.field_id'.
- Generate code to update the vtable pointer to this class' vtable pointer.
- Return the 'this' pointer passed as the first argument to the compiled function.
Complete the implementation of the functions cmp_ctor and cmp_cinits in phase1.ml.
Hints
- Use the cmp_args function to compile argument lists and extend the local context.
- Use the cast_ops helper function to easily cast a list of Ll operands to a particular list of types (for example, the argument types of the signature of the super class' constructor).
- Use the build_fdecl function to add a Ll fdecl to the translation context given a function signature and a code stream.
- The signature for the constructor function has already been computed by cmp_cctxt. Make sure the function you generate matches this signature exactly.
- Our implementation was only ~40 loc. You might might not be on the right track if your code is significantly longer.
Path and Super Methods
In order to call shadowed methods, Oat needs a mechanism to explicitly refer to methods of super class from inside a method. Complete the SuperMethod case of cmp_call.The last step needed to implement dynamic dispatch is to actually call the function pointer returned by cmp_path. Complete the PathMethod case of cmp_call
At this point, many of the test cases for the full compiler should be passing, and your compiler should work with any valid Oat program that doesn't use the cast statement.Hints
- First read through the Func case. You can reuse the code used to cast arguments and handle functions that return unit.
- The solution is quite short, most of the hard work is done by cmp_path
Cast
Implement the Cast statement. This is the dynamic counterpart to the subclass function your implemented in the typechecker. As such, the code you generate must dynamically search up the class heirarchy to test whether the given object's type is a subtype of the one mentioned statically in the Cast statement. This search is essentially a while loop that does pointer dereferences to load the type information from the object representations. Recall that the run-time representation of an object's class is its vtable (dispatch label) pointer.The hardest part about implementing cast is satisfying the llvm type system. Since we only care about comparing the addresses of the vtables and only need to access the first pointer, one approach is to treat vtables as just their addresses: in the loop that traverses the chain of vtable pointers, bitcast the vtable pointer to (Ptr (Ptr I8)) prior to loading from the address, then compare the resulting (Ptr I8) against the address of the subclass' vtable pointer, and against the operand (Ptr I8, Null) for the loop guard.
Hints
- You may find the cast_op helper function useful to avoid manually bitcasting all of the inputs to your hand-generated Ll code.
- Our solution was ~50 loc.
Getting Started
Although the compiler is complete for the subset of Oat not including any class declarations, the global context always includes Object, so initially, compiling will always fail.One way to get started writing cmp_path and cmp_ctor is to replace the stub code (failwith ... ) with code that doesn't produce any useful Ll, but lets compilation complete.
For cmp_ctor, returning build_fdecl c (lookup_csig c cid).ctor [] [] should do the trick. For cmp_path, you can simply return ((Ptr I8, Null), (Ptr I8, Null), []).
This should allow you to compile min.oat in the tests directory, and inspect the code produced for the top-level environment. In c_obj/min.ll, you should see:
;; the named types generated for the Object class and its vtable: ;; %Object = type { %_Object_vtable*, i8* } %_Object_vtable = type { { }*, i8* (%Object*)* } ... ;; the Object.get_name() method (with garbage produced by cmp_path): ;; define i8* @_Object_get_name (%Object* %_this1){ __fresh0: %lhs_or_call2 = load i8* null %cast_op3 = bitcast i8 %lhs_or_call2 to i8* ret i8* %cast_op3 } ... ;; the empty Object constructor ;; define %Object* @_Object_ctor (){}
Now you can examine the bitcode produced by cmp_path before completing cmp_ctor. You may find it easier to debug your code by editing the generated ll files and compilng them with llc, then going back and making the corresponding changes in your compiler.
Grading
Your project grade will be based on the test cases in gradedtests.ml. These are divided up into categories with point values as follows:- Type Checking Error Tests: 20 pts.
- Easy Compiling Tests: 20 pts.
- Medium Compiling Tests: 20 pts.
- Hard Compiling Tests: 10 pts.
- Runtime Error Tests: 10 pts.
- Stress Tests: 10 pts. (not all are shown in gradedtest.ml)
- Coding Style: 10 pts.