Homework 1

Due Date : September 8
Total Points : 100 pts

You must work on this assignment with one or two other students. You should work together on all parts of the assignment, but submit only one set of solutions. If you each work on part, then you each only learn part of the material. Please be sure to write all names on the submitted solutions.

Note: Copying material from Wikipedia, other online sources, or any source will not be tolerated. This form of plagiarism has occurred in the past, and penalties for violating the Duke Community Standard will be severe.

Part I (60 points)

Please refer to the textbook - Computer Architecture : A Quantitative Approach by Hennesey & Patterson, 4th edition for these problems.
  1. (10 pts) H&P 1.2
  2. (10 pts) H&P 1.7
  3. (10 pts) H&P 1.12
  4. (10 pts) H&P 1.13
  5. (10 pts) H&P 1.14
  6. (10 pts) From H&P 3rd edition
    Several researchers have suggested that adding a register-memory addressing mode to a load-store computer might be useful. The idea is to replace sequences of

    LOAD R1,0(Rb)
    ADD R2,R2,R1

    by

    ADD R2,0(Rb)

    Assume the new instruction will cause the clock cycle to increase by 10%. Use the instruction frequencies for the gcc benchmark on the load-store computer from Figure B.27. The new instruction affects only the clock cycle and not the CPI.
    a. What percentage of the loads must be eliminated for the computer with the new instruction to have at least the same performance?
    b. Show a situation in a multiple instruction sequence where a load of R1 followed immediately by a use of R1 (with some type of opcode) could not be replaced by a single instruction of the form proposed, assuming that the same opcode exists.

Part II (40 points)

This part is primarily to make you familiar with just using simplescalar.

SimpleScalar is a set of simulation tools that we will use throughout the semester to study computer architectures. The SimpleScalar toolset (see www.simplescalar.com for more information), which is written in C, is used widely in research for evaluating microarchitectural ideas, and it includes several types of simulators. These simulators trade off speed of simulation versus modeling detail. They can all simulate several ISAs, but in this class we will only use them to simulate the Alpha ISA (used by DEC and then Compaq).

On an x86/Linux machine supported by either Electrical Engineering (lab in Hudson 115a datc1-datc11 via your EE account) or Computer Science ( linux.cs.duke.edu or your desktop machine) or on your own linux machine, create a working directory. Copy the files instruct-progs.tar.gz and simplesim-3v0d.tgz to your working directory. For both of these files, untar (tar -zxvf file.tar) them. Change directory to the newly created simplesim-3.0 directory and then build the purely functional simulator, sim-safe, by typing make sim-safe. Don’t worry about the couple of warnings—they’re normal.

Now you are ready to run the benchmarks that are in the newly created benchmarks directory. Follow the instructions for running 3 out of the 4 benchmarks (all but compress) that are in benchmarks/README; note you may have to remove the file OUT if it already exists. The target is “alpha”, since we are simulating the Alpha ISA. To make sure everything is running correctly, here are the instruction counts for go (545812301), gcc (337331752), and anagram (25593483). It is possible your instruction counts will be slightly different (less than 1% different). This is OK. Also, do not worry if the output is not the same as the reference outputs—this is also OK.

Experiment: For each of these three benchmarks, evaluate the following design idea. Assume that you have a 2GHz processor whose clock rate is bottlenecked by the time to access the L1 data cache for loads (but not stores), and that all instructions currently take 1 cycle (there is no pipelining and no parallelism of any kind). We could increase the clock rate to 2.25 GHz if we let loads take 2 cycles each (but all other instructions are still 1 cycle). Is this a good idea? Show your math!

To evaluate this idea, you must modify sim-safe to count loads and stores of all types. (You do NOT need to modify the simulator to change latencies.) Write your names and email addresses as comments at the begining of your code. Do NOT change the name of sim-safe.c. Add comments to the code where you make modifications and include the string ECE252 at the beginning of each comment.
Note: sim-safe reports “sim_elapsed_time” when it is done. This is a measure of how long the simulation took to run, NOT how long it would take the simulated machine to run the benchmark. This is a very important distinction.

Submit: Submit the modified file on Blackboard.