Due: Thursday, January 31st at 11:59pm
Source:
Download: project1.zip
x86Lite Specification
To Submit: Go Here
Overview
In this project you will implement an interpreter for the small subset of X86 that is the target language for the compilers we build in later projects. This project will continue to help get you up to speed with OCaml programming -- we'll need a few more constructs not used on Project0. You will also implement a non-trivial assembly-language program by hand to familiarize yourself with the workings of the X86 architecture.Getting Started
To get started, download the project source files and create an Eclipse project whose executable is main.native or main.byte. The files included in project1.zip are briefly described below. Those marked with * are the only ones you should need to modify while completing this assignment.assert.ml - the assertion framework gradedtests.ml - graded test cases that we provide main.ml - the main test harness x86.mli - the X86lite interface x86.ml - the X86lite instruction set implementation interpreter.mli - the interpreter's interface *interpreter.ml - where your interpreter code (Part I) should go *providedtests.ml - where your submitted test cases (Part II) should go *team.txt - your group information
Part I: The Interpreter
X86lite assembly code is organized into labeled blocks of instructions, which might be written in concrete syntax as shown below.fact: pushl %ebp movl %esp, %ebp movl 8(%esp), %eax cmpl $0, %eax jG fact_recurse movl $1, %eax popl %ebp ret fact_recurse: subl $1, %eax pushl %eax call fact addl $4, %esp movl 8(%esp), %ebx imull %ebx, %eax popl %ebp ret main: pushl $4 call fact addl $4, %esp retThis code has three blocks, labeled fact, fact_recurse, and main. The code at labels fact and fact_recurse implements a recursive version of the familiar factorial function. The code at main calls factorial with the immediate value 4.
In this project you will implement an interpreter for X86lite programs, but rather than using the concrete syntax shown above, you will interpet programs written in an abstract syntax. For example, the factorial program above is represented using X86lite abstract syntax (and meta-level functions) as shown below. The main structure is a list of insn_blocks, each of which is just a record that has a lbl and a list of insn values.
[ (mk_insn_block (mk_lbl_named "fact") [ Push (ebp); Mov (ebp, esp); Mov (eax, (stack_offset 8l)); Cmp (eax, Imm 0l); J (Sgt, (mk_lbl_named "fact_recurse")); Mov (eax, Imm 1l); Pop (ebp); Ret ]); (mk_insn_block (mk_lbl_named "fact_recurse") [ Sub (eax, Imm 1l); Push (eax); Call (Lbl (mk_lbl_named "fact")); Add (esp, (Imm 4l)); Mov (ebx, (stack_offset 8l)); Imul (Eax, ebx); Pop (ebp); Ret ]); (mk_insn_block (mk_lbl_named "main") [ Push (Imm 4l); Call (Lbl (mk_lbl_named "fact")); Add (esp, (Imm 4l)); Ret ]); ]As you can see, the correspondence between the abstract syntax and the concrete syntax is quite close. The file x86.ml and its corresponding interface x86.mli together provide the basic definitions for the creating and manipulating X86lite abstract syntax -- the main types you should be aware of are lbl, reg, opnd, cnd, insn, and insn_block. Each of these corresponds pretty directly to a concept from the X86lite assembly language.
The x86Lite Specification gives the full details about the meaning of each instruction.
type x86_state = { s_mem : int32 array; (* 1024 32-bit words -- the heap *) s_reg : int32 array; (* 8 32-bit words -- the register file *) mutable s_OF : bool; (* overflow flag *) mutable s_SF : bool; (* sign bit flag *) mutable s_ZF : bool; (* zero flag *) }The memory and register files are simulated by OCaml-level (mutable) arrays of int32 values. The three condition flags are mutable boolean fields; all of the state is bundled together in a record (see IOC Chapter 8.1 for more about OCaml's record types). The main differences between the interpreter and real X86 hardware are:
-
Heap size: To facilitate debugging, and to use less memory for
the simulation, our interpreter will use only 4096 bytes (1024
32-bit words) for its heap size. The part of the heap simulated is
the highest addressable memory locations -- in
interpreter.ml these locations are represented as
mem_top and mem_bot. You will need to complete
the map_addr function found in interpeter.ml that
takes an object-level X86lite address (i.e. a 32-bit address) and
returns the meta-level int index into the s_mem
array. See the documentation of this function in
interpreter.mli.
-
Code labels: Unlike the real X86 machine, our interpreter does
not store program code in the main memory (i.e. the s_mem
array). Instead, it will treat a list of insn_block values
as the program to be executed, the code labels associated with those
instruction blocks are the only legal jump (and call) targets in the
interpreter. A real X86 machine can jump to an address calculated
using arithmetic operations; doing so will cause the interpreter to
throw an X86_segmentation_fault exception.
Treating labels abstractly presents a challenge for the interpreter -- we don't have a concrete machine address to use for EIP the "instruction pointer", which real machines use to keep track of the address of the next instruction to execute. Instead, your interpreter will have to simulate the instruction pointer somehow. Doing so is fairly straightforward in most cases, however the Call instruction is supposed to push a return address (derived from EIP) onto the program stack and Ret is supposed to pop a previously pushed address from the stack and jump to it. To simulate this behavior, your interpeter will need some auxiliarly mechanism to track the stack of return addresses that would have been pushed by Call. Both Call and Ret instructions should still also manipulate the X86 stack pointer Esp. Push x00000000 to the stack for all Call instructions in place of the actual EIP.
Because of this treatment of code labels and the EIP, your interpreter does not have to be 100% faithful to X86 semantics. In particular, programs that do not have well-balanced uses of Call and Ret are ill-defined, as are programs that make exotic use of the stack (e.g. ones that try to manipulate a return-address that was pushed by Call by manually popping it from the stack or reading it directly from memory). For similar reasons, in this interpreter, it is invalid to try to get a word stored at a label (or to attempt to perform indirect access using a label).
-
No Fallthrough: Another consequence of this abstract treatment of code labels is
that your interpreter cannot make any assumptions about how code might
be laid out in memory. Consider the following assembly program:
label1: movl $1, %eax label2: movl $2, %ebx
If executed on a real X86 machine starting from label1, the machine will first move 1 into EAX and then fall through to the next instruction, which moves 2 into EBX. In your interpreter, this code might be represented as:[(mk_insn_block (mk_lbl_named "label1") [ Mov(eax, Imm 1l); ]); (mk_insn_block (mk_lbl_named "label2") [ Mov(ebx, Imm 2l); ]); ]
Because the labels are abstract, the order of insn_blocks could have been switched -- the interpreter doesn't know about the order in which code blocks might be laid out in memory, so it can't simulate fall through.[(mk_insn_block (mk_lbl_named "label2") [ Mov(ebx, Imm 2l); ]); (mk_insn_block (mk_lbl_named "label1") [ Mov(eax, Imm 1l); ]); ]
Instead, your interpreter should raise a X86_segmentation_fault exception if the program would ever need to fall through the end of an insn_block -- that is, both representations of the program above, if started at label1 would result in such an exception. We call an insn_block valid if the last instruction of the block is either a Ret or some form of jump instruction. All of the functionality tests we provide will use programs whose blocks are all valid. For example, the code for factorial given above is valid because each of its blocks ends in a Ret instruction. In general it is always possible to write programs in this form (you can always jump to the label that would otherwise be reached by falling through the end of a block).
Tasks
Complete the implementation of the interpreter.mli interface in the interpreter.ml file, some parts of which are given to you. We recommend that you do things in this order:- First, as a warm-up, implement the map_addr function.
- Next, as an exercise in condition codes, implement the condition_matches functions.
- Then implement the interpretation of operands (including indirect addresses), since this functionality will be needed for simulating instructions.
- Then work on the instructions themselves, perhaps starting with developing a strategy for simulating Call and Ret. For this, you might want to impement an auxiliary function that takes extra state, and you might want to define extra functions that are mutually recursive with interpret. There is more than one way to implement the desired Call and Ret behavior; somehow you'll have to simulate their expected semantics.
- We have provided several helper functions for 32- and 64-bit arithmetic and for getting a bit out of a 32-bit word. You might find them useful.
- You'll probably want a function that sets the three condition flags after a result has been computed.
- Groups of instructions share common behavior -- for example, all of the arithmetic instructions are quite similar. You probably want to factor out the commonality as much as you can (to keep your code clean).
- You will probably want to develop small test cases to try out the functionality of your interpreter. See gradedtests.ml for some examples of how to set up tests that can look at the final state of the machine.
Tests
We will grade this part of the assignment based on a suite of tests. Some of them are available for you to look at in gradedtests.ml, the rest of them we reserve for our own cases. We will also stress-test your interpreter on a number of "big" programs (see Part II) that we have developed and that your classmates will develop as part of this project.To help other teams debug their interpreters, you are encouraged to share "microbenchmark" test cases on the class discussion list. These should be short (2-3 instruction) programs that test various functional aspects of the interpreter. We will not use these tests in our grading. You can add such test cases to the test suite defined in providedtests.ml
Part II: X86lite Assembly Programming
For this part of the assignment, you will create (by hand) a non-trivial X86lite assembly program to test your interpreter's behavior and get some experience programming in X86lite. The factorial program supplied with the test code is an example of what we mean by "non-trivial" -- test cases of roughly this level of difficulty can earn full credit. In particular, your program should include:- Non-trivial control flow: either nested loops, a recursive function, or something similarly complex
- At least one conditional branch.
- Some amount of arithmetic or logic.
- A test case (or test cases) that excercise your program and test the correcness of the interpreter's behavior. If your program computes a simple answer, it can return it in Eax and you can use the run_test harness as in the factorial example; for more complex programs you might want to use the st_test function, which lets you examine the resulting state of the interpreter.
Some good candidates for such programs include: simple sorting or searching algorithms (i.e. treat some chunk of memory as an array of values to be sorted), simple arithmetic algorithms such as gcd, recursive functions over integers or very simple data structures such as linked lists. If you are unsure whether the test case you'd like to implement is sufficient, contact one of the course staff.
Your test should be a function of type unit -> unit that works in the assertion framework (as defined in assert.ml). This test should be supplied in providedtests.ml as the "Student-Provided Big Test for Part II". We will hand grade this submission (and test it against our own interpreter). We will also use all "correct" submitted tests to validate all of the other projects in the course -- the trickier your test is, the harder it will be for other teams to pass it!
Grading
Projects that do not compile will receive no credit!Your team's grade for this project will be based on:
- 70 Points for implementing the X86lite interpreter, graded via our test cases (including some withheld from the source we distributed and some contributed by other project teams).
- 20 Points for submitting a non-trivial X86 test program as described in Part II above.
- 10 Points for "programming style" -- see these guidelines