Homework 3
Due Date : Tuesday October 25
Total Points : 100 pts
You must work on this assignment with one or two other students. You should
work together on all parts of the assignment, but submit only one set
of solutions. If you each work on part, then you each only learn part
of the material. Please be sure to write both names on the submitted
solutions.
Note: Copying material from Wikipedia, other online sources, or
any source will not be tolerated. This form of plagiarism has
occurred in the past, and penalties for violating the Duke Community
Standard will be severe.
Cache Memory (50 points)
- (20 pts) H&P 5.1 a, b, c (not d) Note that you should use the online
version of Cacti.
- (10 pts) H&P 5.6 (a and b only)
- (20 pts) H&P 5.18
Cache configuratinos in Simple Scalar (50 points)
Experiments:
- Use the sim-cache executable for 3 benchmarks (anagram, gcc and go) to evaluate
the performance of the following L1 D$ cache configurations :
- 1-cycle direct-mapped cache
- 2-cycle 2-way set associative cache
- 3-cycle 4-way set associative cache
Evaluate each of these for a data cache size of 1KB (not including tags).
Since sim-cache does not give timing, use instruction counts, the miss rate
and the following cycle counts for calculating timing (you need to do this
yourself) :
- Clock cycle time: 0.5ns
- Miss penalty: 300 cycles
- Cycles for instructions other than load/stores: 1 cycle
Note: Remember that as you double the associativity, the number of sets
halve, you can keep everything else as the default value in the simulator
- Now use sim-outorder to evaluate the relationship between
out-of-order execution and L1 data cache organization. Using a 1KB direct-mapped
cache with hit latency of 1 cycle, 2 cycles and 4 cycles, simulate the following
configurations using the 2 benchmarks gcc and go (total of 18 configurations):
- 2-wide inorder
- 4-wide inorder
- 2-wide-issue with 4 entry RUU
- 2-wide-issue with 32 entry RUU
- 4-wide-issue with 4 entry RUU
- 4-wide-issue with 32 entry RUU
Note: For the inorder part you need to use sim-outorder with inorder flag enabled.
Analysis: Explain the relative impact of data cache access latency with respect
to issue width, in-order vs. out-of-order, and with respect to RUU size (run
more experiments if you need to). Also comment on the relative power
consumption of each design. Be sure to use the correct cycle
count, not "simulation time" for comparing performance.
Submission instructions
You should submit your solutions to the book problems and the modified
sim-outorder.c on blackboard via the digital dropbox. Name the
file with your solutions "hw3_lastname1_lastname2.pdf", with the last names of
your group. Rename sim-outorder.c to
"hw3_lastname1_lastname2_sim-outorder.c". If you submit multiple versions, append a version number and we will grade the highest version
number, "hw3_lastname1_lastname2_v2.pdf".