Homework 4

Due Date : Tuesday November 15
Total Points : 100 pts

You must work on this assignment with one or two other students. You should work together on all parts of the assignment, but submit only one set of solutions. If you each work on part, then you each only learn part of the material. Please be sure to write both names on the submitted solutions.

Note: Copying material from Wikipedia, other online sources, or any source will not be tolerated. This form of plagiarism has occurred in the past, and penalties for violating the Duke Community Standard will be severe.

Please submit a single PDF file with your solutions
Please type your answers
Explain all your answers to get full credit!

Keep all your answers short and precise!

The Questions

(20 pts) A program is running on a computer with a four-entry fully asssociative(micro) TLB:

VP#	PP#	Entry valid
5	30	1
7	1	0
10	10	1
15	25	1

Virtual page index	Physical page#	Present
0	3	Y
1	7	N
2	6	N
3	5	Y
4	14	Y
5	30	Y
6	26	Y
7	11	Y
8	13	N
9	18	N
10	10	Y
11	56	Y
12	110	Y
13	33	Y
14	12	N
15	25	Y

Virtual page accessed	TLB(hit/miss)	Page table(hit or fault)
1
5
9
14
10
6
15
12
7
2

(20 pts) Some memory system handle TLB misses in software (as an exception), while others use hardware for TLB misses.
a. what are the trade-offs between these two methods for handling TLB misses;
b. Will TLB miss handling in software always be slower than TLB miss handling in hardware? Explain.
c. Are these page table structures that would be difficult to handle in hardware but possible in software? Are there any such structures that would be difficult for software to handle but easy for hardware to manage?
d. Why are TLB miss rates for floating-point programs generaglly higher than those for integer programs?
(10 pts) H&P 4.3
(10 pts) H&P 4.16

(20 pts) Consider a single-issue in-order multithreading processor. The processor can fetch and issue (dispatch) one instruction per cycle. If an instruction cannot be issued due to a data dependency, the processor stalls. Integer instructions take one cycle to execute and the result can be used in the next cycle. We also assume that the processor has a perfect branch predictor with no penalty for both taken and not-taken branches.
Each cycle, the processor can fetch and issue one instruction that performs any of the following operations:

load/store, 12-cycle latency (fully pipelined)
integer add, 1-cycle latency
floating-point add, 5-cycle latency (fully pipelined)
branch, no delay slots, 1-cycle latency

The processor does not have a cache. Each memory operation directly accesses main memory. If an instruction cannot be isseud due to a data dependency, the processor stalls. We also assume that the processor has a perfect branch predictor with no penalty for both taken and not-taken branches.
Your job is to analyze the processor utilizations for the following two thread-switching implementations:
Fixed Switching: the processor switches to a different thread every cycle using fixed round robin scheduling. Each of the N threads executes an instruction every N cycles.
Data-dependent Switching: the processor only switches to a different thread when an instruction cannot execute due to a data dependency.
Each thread executes the following MIPS code:

`loop:`	`L.D F2, O(R1)`	; load data into F2
	`ADDI R1,R1,4`	; bump source pointer
	`FADD F3, F3, F2`	; F3 = F3 + F2
	`BNE F2, F4, loop`	; continue if F2 != F4

a) What is the minimum number of threads that we need to fully utilize the processor for each implementation?
b) What is the minimum number of threads that we need to fully utilize the processor for each implementation if we change the load/store latency to 1-cycle (but keep the 5-cycle floating-point add)>
c) Consider a Simultaneous Multithreading (SMT) machine with limited hardware resources. Which of the following hardware constraints can limit the total number of threads that the machine can support? For the ones you choose, briefly describe the minimum requirement to support N threads.

Number of functional units
Number of physical registers
Data cache size
Data cache associativity

(20 pts) Write a couple of paragraphs as a progress report on your class project, and you must include a Figure showing some data that you have collected for your project (for any simulator you are using, run some workloads and generate a graph).
Src: Krste Asanovic, Computer Architecture and Enginnering course at UC Berkeley, 2011

Homework 4

Due Date : Tuesday November 15
Total Points : 100 pts

The Questions

Submit on Blackboard

No Homework 5 :)

Homework 4

Due Date : Tuesday November 15 Total Points : 100 pts

The Questions

Submit on Blackboard

No Homework 5 :)

Due Date : Tuesday November 15
Total Points : 100 pts