HW3: LLVMlite¶
Overview¶
In this project you will implement a non-optimizing compiler for subset of the LLVM IR language. The source language consists of a 64-bit, simplified subset of the LLVM IR that we call LLVMlite. The target language is x86lite.
Getting Started¶
To get started, accept the assignment on Github Classroom and clone your team’s repository.
The files included in the repository are briefly described
below. Those marked with *
are the only ones you should need to
modify while completing this assignment.
README |
help about the main test harness |
Makefile |
builds |
lib/util/assert.ml(i) |
the assertion framework |
lib/util/platform.ml |
OS platform-specific compilation support |
lib/x86/x86.ml |
the X86lite instruction representation |
lib/ll/ll.ml |
the abstract syntax for LLVMlite |
lib/ll/lllexer.mll |
lexer for LLVMlite syntax |
lib/ll/llparser.mly |
parser generator for LLVMlite syntax |
lib/ll/llinterp.ml |
reference interpreter for the LLVMlite semantics |
llprograms/*.ll |
example .ll programs used in testing |
test/gradedtests.ml |
graded test cases that we provide |
bin/main.ml |
command-line interface |
bin/driver.ml |
invoking the compiler pipeline |
bin/cinterop.c |
c code for testing interoperability |
bin/backend.ml |
|
test/studenttests.ml |
|
Note
You’ll need to have menhir and clang installed on your system for this assignment. If you have not already done so, follow the provided instructions to install them.
(The provided development environment should have these configured for you.)
Preliminary Steps¶
Skim through the rest of this web page to get a sense of what this project entails.
Familiarize yourself with the information in the
README
, which explains the ways that you can run your compiler for testing purposes.Then take a look at
driver.ml
, particularly the code related toprocess_ll_file
to see how the backend code fits into the overall compilation pipeline.Then start working through
backend.ml
, following the instructions below.
Warning
This project is potentially very difficult to debug and may take you a while to understand. GET STARTED EARLY!
LLVM Lite Specification¶
The source language for this ‘backend’ part of the full compiler is a subset of the LLVM IR called LLVM Lite. You may find the Language Reference to be a useful resource for this project, though we are only concerned with a small portion of the full LLVM feature set.
The LLVMlite Documentation describes the behavior of LLVM programs in terms of an abstract semantics that is not target specific. This semantics is intended to be faithful to the LLVM Language Reference.
Implementing the Compiler¶
The code we provide in backend.ml
is a minimal skeleton of the
basic structure of the compiler. To a first approximation, for each
part foo
of the abstract syntax (such as prog
or fdecl
),
there is a corresponding compile_foo
function
(i.e. compile_prog
or compile_fdecl
). Most of these
definitions have been left unimplemented (and a few have been left
out). Your job is to complete this translation. Our reference
solution is well under 350 lines of documented code, so if your
implementation is significantly longer than this, you may wish to
rethink your approach or seek help.
The file backend.ml
contains additional hints and explanations
about the compilation strategy that we suggest you use.
We suggest that you stage the development of your compiler like this:
First get a minimal implementation of
compile_fdecl
working so that you can compile functions with empty bodies but varying numbers of input parameters. To do this, you’ll need to understand the System V AMD64 ABI calling conventions (see the lecture slides and Wikipedia for an explanation), then understand the notion of alayout
and complete thearg_loc
function. At this point, the X86 code you generate won’t be able to run because the code for the compiled function does not exit properly. (But you can still look at the generated assembly code to see whether it looks reasonable.)Next implement enough of the
compile_terminator
function to handle (void) functions that return no results. Similarly, implement enough ofcompile_block
to handle blocks with no instructions. At this point, your compiler should be able to generate working code for an LLVM function like that found inreturnvoid.ll
:define void @main(i64 %argc, i8** %argv) { ret void }
(Note, this isn’t part of the test suite, since the value “returned” to the shell when this program runs isn’t well defined.)
Understand the notion of the
ctxt
type and develop a strategy for storinguid
locals. See the comments in thebackend.ml
file. Implement thecompile_operand
function.Implement the
Binop
case forcompile_insn
(which, if you follow the suggested method of compiling locals, will usecompile_operand
).At this point, you probably want to revisit
compile_fdecl
andcompile_block
to adjust them to deal properly with contexts and non-empty control-flow graphs / blocks.Next go back and implement the rest of the cases for
compile_terminator
. At this point, your compiler should be able to handle functions that returni64
values and that contain simple arithmetic and direct jumps.Implement the translation of
Icmp
incompile_insn
, followed byAlloca
,Load
, andStore
.Next tackle the
Call
instruction. The code you generate must properly handle the System V AMD64 ABI calling conventions (but note that we care only about 64-bit values). After successfully completing this step, your compiler should be able to handle the recursive factorial function definition.Breathe a sigh of relief at how easy it is to implement
Bitcast
, because the target language is untyped.Finally, gather your courage, and implement the
Gep
(getelementptr
) instruction.
Testing and Debugging Strategies¶
Testing and debugging a compiler is quite difficult. There are many correct potential translations of a given source program, and there are many incidental changes (such as the choice of label names) that do not affect the semantics of the generated code. It is also difficult to test parts of the translation independently, since simple inputs may depend on almost all of the compilation pipeline.
The test harness provided by main.ml
gives several ways to assess
your code. See the README
file for a full description of the
flags.
We have provided a (minimally-featured) parser for LLVMlite code. It
is sufficiently complete to parse the examples in the llprograms
directory, and we expect you to create additional test cases
yourself. For examples of how to use the test driver infrastructure,
see the gradedtests.ml
file.
You may find it helpful to run the LLVMlite code using our reference
interpreter (with the --interpret-ll
flag).
You may also find it helpful to run the LLVMlite code by compiling it
via clang (with the --clang
flag).
Note that it is not very useful to directly compare the .s
files
produced by your compiler to those produced by clang, but the
behavior of the two versions for the same inputs should be the same.
Graded Test Cases¶
As part of this project, you submit an interesting test case for
the compiler. This test case might take the form of a
.ll
file along with expected outputs (as in our automated tests),
or it might start from hand-generated LLVMlite abstract syntax.
The test case you submit will not count if it is too similar to previously-posted tests! Your test should be distinct from prior test cases. (Note that this policy encourages you to submit test cases early!) Tests that stress parts of the language that aren’t well exercised by the provided tests are particularly encouraged.
Your test case should satisfy the following criteria:
It should use a structured data type such as an array or struct (and so also use getelementptr)
It should involve some non-trivial control flow within a single control flow graph (i.e. loops and conditionals)
It should have at least one helper function (so it should also use the call instruction
Otherwise, it’s free to use any of the LLVMlite arithmetic/bit manipulation functions.
Note that you can compile your .ll file with clang to check whether it is legal LLVM IR code (which is useful because the backend assumes that the code is legal).
Note
As an experiment, we will try to use a shared github repo for managing these test cases, which should make it easier for the class to collectively construct the tests and to use them for their own testing.
You will receive access to a shared github repo (see Ed for details) that has a structure similar to parts of the hw3 codebase. In particular, that repo will contain:
the folder:
llprograms/sp24_hw3/
a file called
tests/sp24_hw3.ml
(These are already both present in the released project code too.)
To submit your test case, push an update to the shared repo to
add a
<file>.ll
with a unique name to thellprograms/sp24_hw3
directoryadd a corresponding test case invocation to the
tests/sp24_hw3.ml
file
See the documentation in the
sp24_hw3.ml
file for more details.To use the other student’s test cases, you should clone the common repo and
copy all the files from the shared
llprograms/sp24_hw3/
directory into your own development version, andcopy the tests/sp24_hw3.ml file into the corresponding location in your development code
We will validate these tests against our own implementation of the compiler (and clang). A second component of your grade will be determined by how your compiler fares against the test cases submitted by the other groups in the class.
Note
Please don’t corrupt, delete, or otherwise tamper with other student’s test cases that are pushed to the repo. (We don’t want to have to moderate the git push operations that affect the test cases.) We can look at the git commit logs to see how the test suite evolved over time.
Note
This structure means that the final 10 points will not be reflected accurately by the autograder (the last 5 will always succeed), so your score may decrease once we have collected all the tests, validated them, and re-run the autograder.
Grading¶
Projects that do not compile will receive no credit!
Your team’s grade for this project will be based on:
90 Points: the various automated tests that we provide. (Some reserved for online grading.)
5 Points for pushing an interesting test case to the shared test suite repo (Graded manually.)
5 Points divided among the test cases created by other groups. (Graded manually.)