Homework 2
Due Date : Thursday September 27
Total Points : 100 pts
You must work on this assignment with one or two other students. You should
work together on all parts of the assignment, but submit only one set
of solutions. If you each work on part, then you each only learn part
of the material. Please be sure to write all names on the submitted
solutions.
Submission Instructions: Use a word processor or latex and submit a pdf of your
solutions on Sakai sakai.duke.edu.
Hand-written or scanned homeworks will not be accepted.
Grades will be returned on Sakai.
Note: Copying material from Wikipedia, other online sources, or
any source will not be tolerated. This form of plagiarism has
occurred in the past, and penalties for violating the Duke Community
Standard will be severe.
Part I (Appendix C, Chapters 3) (60 points)
Please refer to the textbook - Computer Architecture : A Quantitative Approach
by Hennesey & Patterson, 5th edition for these problems.
Do not use the 4th edition problems as most of them have been changed.
- (5 pts) H&P C.7
- (5 pts) H&P 3.2
- (5 pts) H&P 3.3
- (5 pts) H&P 3.5
- (5 pts) H&P 3.6
- (10 pts) H&P 3.11
- (15 pts) H&P 3.12
- (10 pts) H&P 3.17
Part II (40 points)
Start with the sim-safe simulator.
The main loop of the simulator, sim_main(), executes each instruction in-order and increments the cycle counter by one.
Note that sim-safe does NOT model the timing of the execution - it only models the functional effects of each instruction.
To model timing, you'll have to modify sim-safe.c to count how many cycles have elapsed during each iteration of sim_main().
Run all experiments with the three benchmarks (anagram, gcc and go).
- Performance:
Assume your processor is a 4-wide, in-order superscalar (i.e., can execute a maximum of 4 instructions per cycle).
Ignoring data dependencies and assuming no hazards of any kind, what is its performance (i.e., how many cycles does it take to run)?
- Data hazards:
Now assume that the processor cannot execute data dependent instructions in the same cycle.
For example, if an instruction writes to register 2, then no subsequent instruction (in program order) that
reads register 2 can execute in the same cycle (it must wait until the next cycle). How does this affect performance?
Note that this question is independent of the pipeline length.
- Structural hazards:
Now assume that the L1 data cache has only one port and thus the processor can only execute at most one memory operation (load or store) per cycle.
How does this affect its performance?
- Control hazards:
Now further assume that the processor has a 9-stage pipeline.
The result of a conditional branch (i.e., taken or not-taken) is computed in stage 7.
The processor statically predicts all conditional branches as not-taken and continues fetching
from the instruction after the branch (the fall-through instruction). If the branch is indeed not-taken,
then there is no penalty. If the branch is taken, then all instructions after the branch are squashed and
fetching resumes from the instruction immediately from the branch destination. How does this affect performance?
Submit: You will submit the version of sim-safe.c that incorporates all three issues raised in parts (b), (c) , and (d).
Make sure your code changes are properly commented.