## University of Pennsylvania Department of Electrical and System Engineering Circuit-Level Modeling, Design, and Optimization for Digital Systems

| ESE3700, Spring 2024 | Final | Tuesday, May 7 |
|----------------------|-------|----------------|
|                      |       |                |

- Problem weightings shown.
- Calculators allowed.
- Closed book = No text or notes allowed.
- Additional workspace in exam book. Note where to find work in exam book if relevant.

## Name: Answers

| Q1    |                         |
|-------|-------------------------|
| Q2    |                         |
| Q3    |                         |
| Q4    |                         |
| Q5    |                         |
| Total | Mean: 71.4, Stdev: 13.1 |

- 22nm Low Standby Power Process (LSTP)
- $\gamma = 1$
- $V_{dd}$ =800mV
- nominal  $V_{thn} = -V_{thp} = 300 \text{mV}$
- $C_0 = 2 \times 10^{-17} \text{F} \text{ (for } W = 1 \text{ device)}$
- $I_{d,sat_0} = 10\mu A$  (for W = 1 device)  $I_{sd,leak_0} = 0.3 \text{ pA}$  (for W = 1 device) velocity saturated operation
- $R_{wire} = 700 \text{K}\Omega/\text{cm}$
- $C_{wire} = 1.7 \text{pF/cm}$

| Device | $V_{gs}$           | $I_{ds}$                                                              |
|--------|--------------------|-----------------------------------------------------------------------|
| NMOS   | $V_{gs} < V_{thn}$ | $(1 \times 10^{-6}) W e^{\frac{V_{gs} - V_{thn}}{40mV}}$              |
|        | $V_{gs} > V_{thn}$ | $2 \times 10^{-5} W \left( V_{gs} - V_{thn} \right)$                  |
| PMOS   | $V_{gs} > V_{thp}$ | $(-1 \times 10^{-6}) W e^{-\left(\frac{V_{gs}-V_{thp}}{40mV}\right)}$ |
|        | $V_{gs} < V_{thp}$ | $2 \times 10^{-5} W \left( V_{gs} - V_{thp} \right)$                  |

Timing constraints:

$$T \ge t_{clk \to q} + t_{plogic} + t_{setup} \tag{1}$$

$$t_{cdlatch} + t_{cdlogic} \ge t_{hold} \tag{2}$$

Optimal buffering:

$$L_{seg} = 2\sqrt{\frac{R_0(\gamma+1)C_0}{R_{wire}C_{wire}}}$$
(3)

$$W_{buf} = \sqrt{\frac{R_0 C_{wire}}{2R_{wire}C_0}} \tag{4}$$

- 1. (20pts) Speed and Power. Consider using CMOS nand2 gates minimum sized in the default technology. Specify units in all answers.
  - (a) Assume the default technology and calculate  $\tau = R_0 C_0$ .

$$R_{0} = \frac{V_{DD}}{I_{d,sat_{0}}} = \frac{800mV}{10\mu A} = 80k\Omega$$
$$C_{0} = 2 \times 10^{-17}F$$
$$\tau = R_{0}C_{0} = 1.6ps$$



(b) Assume the critical path in the design (including flip-flop setup time and clockto-q delay) can be modeled as a series chain of 10 of these gates, each loaded by 4 equivalent gates. What is the maximum frequency of operation possible?

$$t_{nand2} = R_0(5\gamma + 4 \times 2)C_0 + R_0(3\gamma + 4 \times 2) = 24\tau$$
$$t_{cycle} = 10 \cdot 24\tau = 240\tau = 384ps$$
$$F_{max} = \frac{1}{t_{cycle}} = 2.6GHz$$

(c) Assuming chip cooling allows a maximum dynamic power dissipation of 1W (leakage is negligible), when operating at the frequency from part (b), what is the maximum number of gates that can switch during a clock cycle, on average? In the worst case, each nand gate switches  $C_{load}$ :

$$C_{load} = 4 \times 2C_0 + 5\gamma C_0 = 13C_0$$

$$P_{total,dyn} = \frac{1}{2}N_{gate}C_{load}V_{DD}^2 \times F_{max} \le 1W$$

$$\frac{1}{2} \cdot N_{gate} \cdot 13 \cdot 2 \times 10^{-17} \cdot (.8)^2 \cdot 2.6 \times 10^9 \le 1W$$
Max gate-evals/clock 4.6 M

(d) Assuming the output of one of these gates drives a single gate input through an unbuffered wire with  $R_{wire} = 700 \text{K}\Omega/\text{cm}$ ,  $C_{wire} = 1.7 \text{pF/cm}$ , what is the maximum distance the signal can travel in one clock cycle at the maximum clock frequency identified (part b)?

$$t_{cycle} = R_0(5\gamma + C_{wire}L_{wire} + 2C_0) + R_0(3\gamma + C_{wire}L_{wire} + 2C_0) + 0.5R_{wire}C_{wire}L_{wire}^2 + R_{wire}L_{wire} \cdot 2C_0$$
  

$$= R_0(12C_0 + 2C_{wire}L_{wire}) + 0.5R_{wire}C_{wire}L_{wire}^2 + R_{wire}L_{wire} \cdot 2C_0$$

$$384ps = 19.2ps + 80 \times 10^3 \cdot 2 \cdot 1.7 \times 10^{-12}L_{wire} + 0.5 \cdot 700 \times 10^3 \cdot 1.7 \times 10^{-12}L_{wire}^2 + 700 \times 10^3 \cdot 2 \cdot 2 \times 10^{-17}L_{wire}$$

$$364.8ps = 2.72 \times 10^{-7}L_{wire} + 5.95 \times 10^{-7}L_{wire}^2 + 2.8 \times 10^{-11}L_{wire}$$

We can clearly drop the final term since  $2.8 \times 10^{-11} << 2.72 \times 10^{-7}$ . Since  $L_{wire}$  must be less than 1 (much less than 1), the  $L_{wire}^2$  term will be much less than the  $L_{wire}$  term. So, we can start by solving:

$$364.8ps \approx 2.72 \times 10^{-7} L_{wire}$$
$$L_{wire} \approx 0.0013 cm = 13 \mu m$$

Max Distance  $13\mu m$ 

2. (20pts) Consider the following dynamic logic circuit. What logic function does it evaluate?

Assume the circuit is loaded by  $7C_0$  output. Assume  $C_{diff} = 0.5C_{gate}$ ,  $\mu_n = 2\mu_p$ . Assume the CLK signal is driven strongly such that the rise time on the clock is  $R_0C_0$ . Use Elmore delay calculations where appropriate. For full credit (and partial credit consideration) show your delay components (stages, components of Elmore delay calculation).



| Out as a function of the          |                                                                                                                                         |
|-----------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| inputs?                           | $\overline{S \cdot (A + B + C + D + K + L + M + N)}$                                                                                    |
| Evaluate Delay in units of $\tau$ | $\frac{\frac{10}{2} + 4 \times 4 + \frac{12}{2} + 12}{10} + \frac{2 \times 4 + \frac{12}{2} + 12}{4} + \frac{\frac{12}{2} + 8}{4}$      |
| (show delay                       | $\frac{\frac{12}{2}+3\times8+\frac{10}{2}+7}{10}+\frac{\frac{12}{2}+5\times\frac{8}{2}+7}{8}+\frac{\frac{12}{2}+8+7}{8}$                |
| components)                       | $= 24.85\tau$                                                                                                                           |
|                                   |                                                                                                                                         |
|                                   |                                                                                                                                         |
| Precharge Time                    | $\frac{\frac{12}{2} + 4 \times 4 + \frac{10}{2} + 12}{\frac{12}{2}} + \frac{2 \times 4 + \frac{10}{2}}{4} + \frac{\frac{12}{2} + 8}{4}$ |
|                                   | $^{2} = 13.25\tau$                                                                                                                      |
|                                   |                                                                                                                                         |
|                                   |                                                                                                                                         |

3. (20pts) Below is a register built from cascading two dynamic latches.  $C_1$  and  $C_2$  are not explicit capacitors but just from the parasitics at each node. Assume the clocks are ideal non-overlapping clocks with a frequency of 250MHz. Each transmission gate has a delay of 100ps and each inverter has a delay of 200ps.



- (a) Is this a positive or negative edge-triggered device? Positive
- (b) determine the register timing parameters. Include units.



4. SRAM (20 points). 4 different 6T SRAM cells are designed and tested for correct operation. There are three input test signals: clk, Rd/Wr, and data. A single read or write operation occurs in a single clock period. When Rd/Wr is high a write operation should occur, and when it is low a read operation should occur. The data signal gives what data should be written into the cell when doing a write operation. Below are the test signals and BL and BL bar of all 4 cells. For each cell, indicate whether the cell is exhibiting correct operation. If not, explain what is not correct about the operation. Answer table on next page.



| Bitcell 1 | Correct with a precharge of $V_{dd}/2$                                                                                                 |
|-----------|----------------------------------------------------------------------------------------------------------------------------------------|
| Bitcell 2 | Correct with a precharge of $V_{dd}$                                                                                                   |
| Bitcell 3 | Not correct. There is a read upset.<br>In the 3rd clk cycle a 1 is read instead of 0                                                   |
| Bitcell 4 | Not correct. During the second write, BLs<br>are not driven with complimentary data.<br>Also in last clk cycle, reads a 0 instead of 1 |

- 5. (20pts) Short Answer Questions: Answer the questions briefly. Include diagrams and equations as needed. Be **clear** in your explanation and **handwriting**.
  - A Identify and describe two differences between SRAM and DRAM memory cells. SRAM stores data in cross coupled inverter pair and is actively driven, whereas DRAM stores data dynamically on a capacitive node and must be refreshed. SRAM is larger than DRAM.
  - **B** What is a sense amplifier and why might you need it? A sense amplifier sense a small change in bitlines and amplifies it rail-to-rail. This may be needed if your bitlines are slow to charge/discharge because of a large memory size.
  - C What is a memory read upset and what is one way you can avoid them? A read upset is when during a read, the data in a memory cell gets flipped. One way to avoid them is to precharge to  $V_{dd}/2$
  - ${\bf D}\,$  Draw a schematic of a tristate buffer and draw its truth table.

