Homework 2

Due Date : Thursday September 29
Total Points : 100 pts

You must work on this assignment with one or two other students. You should work together on all parts of the assignment, but submit only one set of solutions. If you each work on part, then you each only learn part of the material. Please be sure to write all names on the submitted solutions.

Note: Copying material from Wikipedia, other online sources, or any source will not be tolerated. This form of plagiarism has occurred in the past, and penalties for violating the Duke Community Standard will be severe.

Keep all your answers short!

Part I (Appendix A, Chapters 2) (60 points)

  1. (5 pts) H&P 2.2
  2. (5 pts) H&P 2.3
  3. (5 pts) H&P 2.4
  4. (5 pts) H&P 2.5
  5. (5 pts) H&P 2.7
  6. (5 pts) H&P 2.11
  7. (10 pts) H&P 2.12
  8. (10 pts) How does static scheduling work? What are the advantages of using static scheduling? What are its problems and possible solutions that address these? Describe both hardware and software aspects.
    Cite all sources you use to answer these questions.
  9. (10 pts) Do some research on the pipelines of Intel Xeon processors and Intel Atom processors. For each describe the key differences between their pipelines. How deep are they? How wide? Why do you think they have different (or similar) pipelines? What are the advantages and disadvantages of them respectively?

Part II (40 points)

Start with the sim-safe simulator. The main loop of the simulator, sim_main(), executes each instruction in-order and increments the cycle counter by one. Note that sim-safe does NOT model the timing of the execution - it only models the functional effects of each instruction. To model timing, you'll have to modify sim-safe.c to count how many cycles have elapsed during each iteration of sim_main(). Run all experiments with the three benchmarks (anagram, gcc and go).

  1. Performance: Assume your processor is a 4-wide, in-order superscalar (i.e., can execute a maximum of 4 instructions per cycle). Ignoring data dependencies and assuming no hazards of any kind, what is its performance (i.e., how many cycles does it take to run)?
  2. Data hazards: Now assume that the processor cannot execute data dependent instructions in the same cycle. For example, if an instruction writes to register 2, then no subsequent instruction (in program order) that reads register 2 can execute in the same cycle (it must wait until the next cycle). How does this affect performance? Note that this question is independent of the pipeline length.
  3. Structural hazards: Now assume that the L1 data cache has only one port and thus the processor can only execute at most one memory operation (load or store) per cycle. How does this affect its performance?
  4. Control hazards: Now further assume that the processor has a 9-stage pipeline. The result of a conditional branch (i.e., taken or not-taken) is computed in stage 7. The processor statically predicts all conditional branches as not-taken and continues fetching from the instruction after the branch (the fall-through instruction). If the branch is indeed not-taken, then there is no penalty. If the branch is taken, then all instructions after the branch are squashed and fetching resumes from the instruction immediately from the branch destination. How does this affect performance?


Submit: You will submit the version of sim-safe.c that incorporates all three issues raised in parts (b), (c) , and (d). Make sure your code changes are properly commented.

You should submit your solutions to the book problems and the modified sim-safe.c on blackboard via the digital dropbox. Name the file with your solutions "hw2_lastname1_lastname2.pdf", with the last names of your group. Rename sim-safe.c to "hw1_lastname1_lastname2_sim-safe.c". If you submit multiple versions, append a version number and we will grade the highest version number, "hw2_lastname1_lastname2_v2.pdf".