Homework 3

Due Date : Friday October 19
Total Points : 100 pts

You must work on this assignment with one or two other students. You should work together on all parts of the assignment, but submit only one set of solutions. If you each work on part, then you each only learn part of the material. Please be sure to write both names on the submitted solutions.

Submission Instructions: Use a word processor or latex and submit a pdf of your solutions on Sakai sakai.duke.edu. Hand-written or scanned homeworks will not be accepted. Grades will be returned on Sakai.

Note: Copying material from Wikipedia, other online sources, or any source will not be tolerated. This form of plagiarism has occurred in the past, and penalties for violating the Duke Community Standard will be severe.

Cache Memory (50 points)

Please refer to the textbook - Computer Architecture : A Quantitative Approach by Hennesey & Patterson, 5th edition for these problems. Do not use the 4th edition problems as most of them have been changed.
  1. (10 pts) H&P 2.1 (a and b only)
  2. (20 pts) H&P 2.4 a, b, c, d, e
  3. (20 pts) H&P 2.8 a, b, c Note that you should use the online version of Cacti.

Cache configuratinos in Simple Scalar (50 points)

Experiments:

  1. Use the sim-cache executable for 3 benchmarks (anagram, gcc and go) to evaluate the performance of the following L1 D$ cache configurations :
    Evaluate each of these for a data cache size of 1KB (not including tags).
    Since sim-cache does not give timing, use instruction counts, the miss rate and the following cycle counts for calculating timing (you need to do this yourself) : Note: Remember that as you double the associativity, the number of sets halve, you can keep everything else as the default value in the simulator

  2. Now use sim-outorder to evaluate the relationship between out-of-order execution and L1 data cache organization. Using a 1KB direct-mapped cache with hit latency of 1 cycle, 2 cycles and 4 cycles, simulate the following configurations using the 2 benchmarks gcc and go (total of 18 configurations): Note: For the inorder part you need to use sim-outorder with inorder flag enabled.

    Analysis: Explain the relative impact of data cache access latency with respect to issue width, in-order vs. out-of-order, and with respect to RUU size (run more experiments if you need to). Also comment on the relative power consumption of each design. Be sure to use the correct cycle count, not "simulation time" for comparing performance.

Submission instructions

Submit any modified files with the code changes properly commented.