## Chapter 3 <br> PROBLEMS

For all problems, use the device parameters provided in Chapter 3 (Tables 3.2 and 3.5) and the inside back book cover, unless otherwise mentioned. Also assume $\mathrm{T}=300 \mathrm{~K}$ by default.

1. [E,SPICE,3.2.2]
a. Consider the circuit of Figure 0.1. Using the simple model, with $\mathrm{V}_{\text {Don }}=0.7 \mathrm{~V}$, solve for $I_{D}$.
b. Find $I_{D}$ and $V_{D}$ using the ideal diode equation. Use $I_{s}=10^{-14} \mathrm{~A}$ and $T=300 \mathrm{~K}$.
c. Solve for $V_{D 1}, V_{D 2}$, and $I_{D}$ using SPICE.
d. Repeat parts $b$ and $c$ using $I_{S}=10^{-16} \mathrm{~A}, T=300 \mathrm{~K}$, and $I_{S}=10^{-14} \mathrm{~A}, T=350 \mathrm{~K}$.


Figure 0.1 Resistor diode circuit.
2. [M, None, 3.2.3] For the circuit in Figure 0.2, $V_{s}=3.3 \mathrm{~V}$. Assume $A_{D}=12 \mu \mathrm{~m}^{2}, \phi_{0}=0.65 \mathrm{~V}$, and $m=0.5 . N_{A}=2.5 \mathrm{E} 16$ and $N_{D}=5 \mathrm{E} 15$.
a. Find $I_{D}$ and $V_{D}$.
b. Is the diode forward- or reverse-biased?
c. Find the depletion region width, $W_{j}$, of the diode.
d. Use the parallel-plate model to find the junction capacitance, $C_{j}$.
e. Set $V_{s}=1.5 \mathrm{~V}$. Again using the parallel-plate model, explain qualitatively why $C_{j}$ increases.


Figure 0.2 Series diode circuit
3. [E, None, 3.3.2] Figure 0.3 shows NMOS and PMOS devices with drains, source, and gate ports annotated. Determine the mode of operation (saturation, linear, or cutoff) and drain current $I_{D}$ for each of the biasing configurations given below. Verify with SPICE. Use the following transistor data: NMOS: $k_{n}^{\prime}=115 \mu \mathrm{~A} / \mathrm{V}^{2}, V_{70}=0.43 \mathrm{~V}, \lambda=0.06 \mathrm{~V}^{-1}$, PMOS: $k_{p}^{\prime}=30$ $\mu \mathrm{A} / \mathrm{V}^{2}, V_{T 0}=-0.4 \mathrm{~V}, \lambda=-0.1 \mathrm{~V}^{-1}$. Assume $(W / L)=1$.
a. NMOS: $V_{G S}=2.5 \mathrm{~V}, V_{D S}=2.5 \mathrm{~V}$. PMOS: $V_{G S}=-0.5 \mathrm{~V}, V_{D S}=-1.25 \mathrm{~V}$.
b. NMOS: $V_{G S}=3.3 \mathrm{~V}, V_{D S}=2.2 \mathrm{~V}$. PMOS: $V_{G S}=-2.5 \mathrm{~V}, V_{D S}=-1.8 \mathrm{~V}$.
c. NMOS: $V_{G S}=0.6 \mathrm{~V}, V_{D S}=0.1 \mathrm{~V}$. PMOS: $V_{G S}=-2.5 \mathrm{~V}, V_{D S}=-0.7 \mathrm{~V}$.
4. [E, SPICE, 3.3.2] Using SPICE plot the $I-V$ characteristics for the following devices.


Figure 0.3 NMOS and PMOS devices.
a. NMOS $W=1.2 \mu \mathrm{~m}, L=0.25 \mu \mathrm{~m}$
b. NMOS $W=4.8 \mu \mathrm{~m}, L=0.5 \mu \mathrm{~m}$
c. $\operatorname{PMOS} W=1.2 \mu \mathrm{~m}, L=0.25 \mu \mathrm{~m}$
d. PMOS $W=4.8 \mu \mathrm{~m}, L=0.5 \mu \mathrm{~m}$
5. [E, SPICE, 3.3.2] Indicate on the plots from problem 4.
a. the regions of operation.
b. the effects of channel length modulation.
c. Which of the devices are in velocity saturation? Explain how this can be observed on the $I$ $V$ plots.
6. [M, None, 3.3.2] Given the data in Table 0.1 for a short channel NMOS transistor with $V_{D S A T}=0.6 V$ and $k^{\prime}=100 \mu \mathrm{~A} / \mathrm{V}^{2}$, calculate $V_{T 0}, \gamma, \lambda, 2\left|\phi_{f}\right|$, and $W / L$ :

Table 0.1 Measured NMOS transistor data

|  | $\boldsymbol{V}_{\boldsymbol{G} \boldsymbol{S}}$ | $\boldsymbol{V}_{\boldsymbol{D S}}$ | $\boldsymbol{V}_{\boldsymbol{B} \boldsymbol{S}}$ | $\boldsymbol{I}_{\boldsymbol{D}}(\boldsymbol{\mu} \mathbf{A})$ |
| :---: | :---: | :---: | :---: | :---: |
| 1 | 2.5 | 1.8 | 0 | 1812 |
| 2 | 2 | 1.8 | 0 | 1297 |
| 3 | 2 | 2.5 | 0 | 1361 |
| 4 | 2 | 1.8 | -1 | 1146 |
| 5 | 2 | 1.8 | -2 | 1039 |

7. [E, None, 3.3.2] Given Table 0.2 ,the goal is to derive the important device parameters from these data points. As the measured transistor is processed in a deep-submciron technology, the 'unified model' holds. From the material constants, we also could determine that the saturation voltage $V_{D S A T}$ equals -1 V . You may also assume that $-2 \Phi_{\mathrm{F}}=-0.6 \mathrm{~V}$.
NOTE: The parameter values on Table 3.3 do NOT hold for this problem.
a. Is the measured transistor a PMOS or an NMOS device? Explain your answer.
b. Determine the value of $V_{T 0}$.
c. Determine $\gamma$.
d. Determine $\lambda$.
e. Given the obtained answers, determine for each of the measurements the operation region of the transistor (choose from cutoff, resistive, saturated, and velocity saturated). Annotate your finding in the right-most column of the above.

Table 0.2 Measurements taken from the MOS device, at different terminal voltages.

| Measurement <br> number | VGS <br> $(\mathrm{V})$ | VDS <br> $(\mathrm{V})$ | VSB <br> $(\mathrm{V})$ | ID $(\mu \mathrm{A})$ | Operation <br> Region? |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | -2.5 | -2.5 | 0 | -84.375 |  |
| 2 | 1 | 1 | 0 | 0.0 |  |
| 3 | -0.7 | -0.8 | 0 | -1.04 |  |
| 4 | -2.0 | -2.5 | 0 | -56.25 |  |
| 5 | -2.5 | -2.5 | -1 | -72.0 |  |
| 6 | -2.5 | -1.5 | 0 | -80.625 |  |
| 7 | -2.5 | -0.8 | 0 | -66.56 |  |

8. [M, None, 3.3.2] An NMOS device is plugged into the test configuration shown below in Figure 0.4. The input $V_{i n}=2 \mathrm{~V}$. The current source draws a constant current of $50 \mu \mathrm{~A}$. $R$ is a variable resistor that can assume values between $10 \mathrm{k} \Omega$ and $30 \mathrm{k} \Omega$. Transistor M1 experiences short channel effects and has following transistor parameters: $k^{\prime}=110^{*} 10^{-6} \mathrm{~V} / \mathrm{A}^{2}, \mathrm{~V}_{\mathrm{T}}=0.4$, and $\mathrm{V}_{\mathrm{DSAT}}=0.6 \mathrm{~V}$. The transistor has a $\mathrm{W} / \mathrm{L}=2.5 \mu / 0.25 \mu$. For simplicity body effect and channel length modulation can be neglected. i.e $\lambda=0, \gamma=0$. .


Figure 0.4 Test configuration for the NMOS device.
a. When $R=10 \mathrm{k} \Omega$ find the operation region, $\mathrm{V}_{\mathrm{D}}$ and $\mathrm{V}_{\mathrm{S}}$.
b. When $\mathrm{R}=30 \mathrm{k} \Omega$ again determine the operation region $\mathrm{V}_{\mathrm{D}}, \mathrm{V}_{\mathrm{S}}$
c. For the case of $\mathrm{R}=10 \mathrm{k} \Omega$, would $\mathrm{V}_{\mathrm{S}}$ increase or decrease if $\lambda \neq 0$. Explain qualitatively
9. [M, None, 3.3.2] Consider the circuit configuration of Figure 0.5 .
a. Write down the equations (and only those) which are needed to determine the voltage at node $X$. Do NOT plug in any values yet. Neglect short channel effects and assume that $\lambda_{p}$ $=0$.
b. Draw the (approximative) load lines for both MOS transistor and resistor. Mark some of the significant points.
c. Determine the required width of the transistor (for $L=0.25 \mu \mathrm{~m}$ ) such that $X$ equals 1.5 V .
d. We have, so far, assumed that $M_{1}$ is a long-channel device. Redraw the load lines assuming that $M_{1}$ is velocity-saturated. Will the voltage at $X$ rise or fall?


Figure 0.5 MOS circuit.
10. [M, None, 3.3.2] The circuit of Figure 0.6 is known as a source-follower configuration. It achieves a DC level shift between the input and output. The value of this shift is determined by the current $I_{0}$. Assume $\gamma=0.4,2\left|\phi_{\mathrm{f}}\right|=0.6 \mathrm{~V}, V_{T 0}=0.43 \mathrm{~V}, k^{\prime}=115 \mu \mathrm{~A} / \mathrm{V}^{2}$, and $\lambda=0$. The NMOS device has $W / L=5.4 \mu / 1.2 \mu$ such that the short channel effects are not observed.
a. Derive an expression giving $V_{i}$ as a function of $V_{o}$ and $V_{T}\left(V_{o}\right)$. If we neglect body effect, what is the nominal value of the level shift performed by this circuit.
b. The NMOS transistor experiences a shift in $V_{T}$ due to the body effect. Find $V_{T}$ as a function of $V_{o}$ for $V_{o}$ ranging from 0 to 1.5 V with 0.25 V intervals. Plot $V_{T} \mathrm{vs}$. $V_{o}$.
c. Plot $V_{o}$ vs. $V_{i}$ as $V_{o}$ varies from 0 to 1.5 V with 0.25 V intervals. Plot two curves: one neglecting the body effect and one accounting for it. How does the body effect influence the operation of the level converter? At $V_{o}$ (body effect) $=1.5 \mathrm{~V}$, find $V_{o}$ (ideal) and, thus, determine the maximum error introduced by body effect.


Figure 0.6 Source-follower level converter.
11. [M, SPICE, 3.3.2] Problem 11 uses the MOS circuit of Figure 0.7.
a. Plot $V_{\text {out }}$ vs. $V_{\text {in }}$ with $V_{\text {in }}$ varying from 0 to 2.5 volts (use steps of 0.5 V ). $V_{D D}=2.5 \mathrm{~V}$.
b. Repeat $a$ using SPICE.
c. Repeat $a$ and $b$ using a MOS transistor with $(W / L)=4 / 1$. Is the discrepancy between manual and computer analysis larger or smaller. Explain why.

12. [E, None, 3.3.2]Below in Figure 0.8 is an I-V transfer curve for an NMOS transistor. In this problem, the objective is to use this I-V curve to obtain information about the transistor. The transistor has $(\mathrm{W} / \mathrm{L})=(1 \mu / 1 \mu)$. It may also be assumed that velocity saturation does not play a role in this example. Also assume $-2 \Phi_{\mathrm{F}}=0.6 \mathrm{~V}$. Using Figure 0.8 determine the following parameters: device $\mathrm{V}_{\mathrm{TO}}, \gamma, \lambda$.


Figure 0.8 I-V curves
13. [ E , None , 3.3.2]The curves below in Figure 0.9 represent the gate voltage $\left(\mathrm{V}_{\mathrm{GS}}\right)$ vs. drain current ( $\mathrm{I}_{\mathrm{DS}}$ ) of two NMOS devices which are on the same die and operate in subthreshold region. Due to process variations on the same die the curves do not overlap.


Figure 0.9 Subthreshold current curves. Difference is due to process variations

Also assume that the transistors are within the same circuit configurations as Figure 0.10 in If the in put voltages are both $\mathrm{V}_{\text {in }}=0.2 \mathrm{~V}$. What would be the respective durations to discharge the load of $C_{L}=1 \mathrm{pF}$ attached to the drains of these devices.


Figure 0.10 The circuit for testing the time to discharge the load capacitance through a device operating in subthreshold region.
14. [M, None, 3.3.2] Short-channel effects:
a. Use the fact that current can be expressed as the product of the carrier charge per unit length and the velocity of carriers $\left(I_{D S}=\mathrm{Q} v\right)$ to derive $I_{D S}$ as a function of $W, C_{o x}, V_{G S}-V_{T}$, and carrier velocity $v$.
b. For a long-channel device, the carrier velocity is the mobility times the applied electric field. The electrical field, which has dimensions of $\mathrm{V} / \mathrm{m}$, is simply $\left(V_{G S}-V_{T}\right) / 2 L$. Derive $I_{D S}$ for a long-channel device.
c. From the equation derived in $a$, find $I_{D S}$ for a short-channel device in terms of the maximum carrier velocity, $v_{\max }$.
Based on the results of $b$ and $c$ describe the most important differences between shortchannel and long-channel devices.
15. [C, None, 3.3.2] Another equation, which models the velocity-saturated drain current of an MOS transistor is given by

$$
I_{d s a t}=\frac{1}{1+\left(V_{G S}-V_{t}\right) /\left(E_{s a t} L\right)}\left(\frac{\mu_{0} C_{o x}}{2}\right) \frac{W}{L}\left(V_{G S}-V_{T}\right)^{2}
$$

Using this equation it is possible to see that velocity saturation can be modeled by a MOS device with a source-degeneration resistor (see Figure 0.11).
a. Find the value of $R_{S}$ such that $I_{D S A T}\left(V_{G S}, V_{D S}\right)$ for the composite transistor in the figure matches the above velocity-saturated drain current equation. Hint: the voltage drop across $R_{S}$ is typically small.
b. Given $E_{\text {sat }}=1.5 \mathrm{~V} / \mu \mathrm{m}$ and $k^{\prime}=\mu_{0} C_{o x}=20 \mu \mathrm{~A} / \mathrm{V}^{2}$, what value of $R_{S}$ is required to model velocity saturation. How does this value depend on $W$ and $L$ ?


Figure 0.11 Source-degeneration model of velocity saturation.
16. [E, None, 3.3.2] The equivalent resistances of two different devices need to be computed.
a. First, consider the fictive device whose I-V characteristic is given in Figure 0.12. Constant $k$ has the dimension of $\mathrm{S}(\operatorname{or} 1 / \Omega) . \mathrm{V}_{0}$ is a voltage characteristic to the device. Calculate the equivalent resistance for an output voltage transition from 0 to $2 \mathrm{~V}_{0}$ by integrating the resistance as a function of the voltage.


Figure 0.12 Fictive device whose equivalent resistance is to be calculated.
b. Next, obtain the resistance equation 3.43 using the Figure 0.13 . Assuming the $\mathrm{V}_{\mathrm{GS}}$ is kept at $\mathrm{V}_{\mathrm{DD}}$, Calculate the Req as output $\left(\mathrm{V}_{\mathrm{DS}}\right)$ transitions from $\mathrm{V}_{\mathrm{DD}}$ to $\mathrm{V}_{\mathrm{DD}} / 2$.(Figure 0.13).

Hint: Make sure you use the Short channel Unified MOS Model equations. Hint: You will need to use the expansion. $\ln (1+x) \approx x-x^{2} / 2+x^{3} / 3$


Figure 0.13 The equivalent resistance is to be computed for the $\mathrm{H} \rightarrow \mathrm{L}$ transition.
17. [M, None, 3.3.3] Compute the gate and diffusion capacitances for transistor M1 of Figure 0.7. Assume that drain and source areas are rectangular, and are $1 \mu \mathrm{~m}$ wide and $0.5 \mu \mathrm{~m}$ long. Use the parameters of Example 3.5 to determine the capacitance values. Assume $m_{j}=0.5$ and $m_{j s w}=0.44$. Also compute the total charge stored at node In, for the following initial conditions:
a. $V_{\text {in }}=2.5 \mathrm{~V}, V_{\text {out }}=2.5 \mathrm{~V}, 0.5 \mathrm{~V}$, and 0 V .
b. $V_{\text {in }}=0 \mathrm{~V}, V_{\text {out }}=2.5 \mathrm{~V}, 0.5 \mathrm{~V}$, and 0 V .
18. [E, None, 3.3.3]Consider a CMOS process with the following capacitive parameters for the NMOS transistor: $\mathrm{C}_{\mathrm{GSO}}, \mathrm{C}_{\mathrm{GDO}}, \mathrm{C}_{\mathrm{OX}}, \mathrm{C}_{\mathrm{J}}, \mathrm{m}_{\mathrm{j}}, \mathrm{C}_{\mathrm{jsw}}, \mathrm{m}_{\mathrm{jsw}}$, and PB , with the lateral diffusion equal to $\mathrm{L}_{\mathrm{D}}$. The MOS transistor M1 is characterized by the following parameters: W, L, AD, PD, AS, PS.

a. Consider the configuration of Figure $0.14 . \mathrm{V}_{\mathrm{DD}}$ is equal to $\mathrm{V}_{\mathrm{T}}$ (the threshold voltage of the transistor) Assume that the initial value of $\mathrm{V}_{\mathrm{g}}$ equals 0 . A current source with value $\mathrm{I}_{\mathrm{in}}$ is applied at time 0 . Assuming that all the capacitance at the gate node can be lumped into a single, grounded, linear capacitance $\mathrm{C}_{\mathrm{T}}$, derive an expression for the time it will take for $\mathrm{V}_{\mathrm{g}}$ to reach $2 \mathrm{~V}_{\mathrm{T}}$
b. The obvious question is now how to compute $\mathrm{C}_{\mathrm{T}}$. Among, $\mathrm{C}_{\mathrm{db}}, \mathrm{C}_{\mathrm{sb}}, \mathrm{C}_{\mathrm{gs}}, \mathrm{C}_{\mathrm{gd}}, \mathrm{C}_{\mathrm{gb}}$ which of these parasatic capacitances of the MOS transistor contribute to $\mathrm{C}_{\mathrm{T}}$. For those that contribute to $\mathrm{C}_{\mathrm{T}}$ write down the expression that determines the value of the contribution. Use only the parameters given above. If the transistor goes through different operation regions and this impacts the value of the capacitor, determine the expression of the contribution for each region (and indicate the region).
c. Consider now the case depicted in Figure 0.15 . Asssume that $\mathrm{V}_{\mathrm{d}}$ is initially at 0 and we want to charge it up to $2 \mathrm{~V}_{\mathrm{T}}$. Again among, $\mathrm{C}_{\mathrm{db}}, \mathrm{C}_{\mathrm{sb}}, \mathrm{C}_{\mathrm{gg}}, \mathrm{C}_{\mathrm{gd}}, \mathrm{C}_{\mathrm{gb}}$ which device capacitances contribute to the total drain capacitance? Once again, make sure you differentiate between different operation regions..


Figure 0.15 Circuit to measure the total drain capacitance
19. [M, None, 3.3.3]For the NMOS transistor in Figure 0.16, sketch the voltages at the source and at the drain as a function of time. Initially, both source and drain are at +2.5 volts. Note that the drain is open circuited. The $10 \mu \mathrm{~A}$ current source is turned on at $\mathrm{t}=0$. Device parameters: $\mathrm{W} / \mathrm{L}_{\text {eff }}=125 \mu / 0.25 \mu ; \mu \mathrm{C}_{\mathrm{ox}}=100 \mu \mathrm{~A} / \mathrm{V}^{2} ; \mathrm{C}_{\mathrm{ox}}=6 \mathrm{fF} / \mu^{2} ; \mathrm{C}_{\mathrm{OL}}($ per width $)=0.3 \mathrm{fF} / \mu ; \mathrm{C}_{\mathrm{sb}}=$ $100 \mathrm{fF} ; \mathrm{Cdb}=100 \mathrm{fF} ; \mathrm{V}_{\text {DSAT }}=1 \mathrm{~V}$.

HINT:Do not try to solve this analytically. Just use a qualitative analysis to derive the different operation modes of the circuit and the devices.


Figure 0.16 Device going through different operation regions over time
20. [C, SPICE, 3.4] Though impossible to quantify exactly by hand, it is a good idea to understand process variations and be able to at least get a rough estimate for the limits of their effects.
a. For the circuit of Figure 0.7, calculate nominal, minimum, and maximum currents in the NMOS device with $V_{\text {in }}=0 \mathrm{~V}, 2.5 \mathrm{~V}$ and 5 V . Assume $3 \sigma$ variations in $V_{T 0}$ of 25 mV , in $k^{\prime}$ of $15 \%$, and in lithographic etching of $0.15 \mu \mathrm{~m}$.
b. Analyze the impact of these current variations on the output voltage. Assume that the load resistor also can vary by $10 \%$. Verify the result with SPICE.
21. [E, None, 3.5] A state-of-the-art, synthesizable, embedded microprocessor consumes $0.4 \mathrm{~mW} / \mathrm{MHz}$ when fabricated using a $0.18 \mu \mathrm{~m}$ process. With typical standard cells (gates), the area of the processor is 0.7 mm 2 . Assuming a 100 Mhz clock frequency, and 1.8 V power
supply. Assume short channel devices, but ignore second order effects like mobility degradation, series resistance, etc.
a. Using fixed voltage scaling and constant frequency, what will the area, power consumption, and power density of the same processor be, if scaled to $0.12 \mu \mathrm{~m}$ technology, assuming the same clock frequency?
b. If the supply voltage in the scaled $0.12 \mu \mathrm{~m}$ part is reduced to 1.5 V what will the power consumption and power density be?
c. How fast could the scaled processor in Part (b) be clocked? What would the power and power density be at this new clock frequency?
d. Power density is important for cooling the chip and packaging. What would the supply voltage have to be to maintain the same power density as the original processor?
22. The superscalar, superpipelined, out-of-order executing, highly parallel, fully x86 compatible JMRII microprocessor was fabricated in a 0.25 m technology and was able to operate at 100 MHZ, consuming 10 watts using a 2.5 V power supply.
a. Using fixed voltage scaling, what will the speed and power consumption of the same processor be if scaled to $0.1 \mu \mathrm{~m}$ technology?
b. If the supply voltage on the $0.1 \mu \mathrm{~m}$ part were scaled to 1.0 V , what will the power consumption and speed be?
c. What supply should be used to fix the power consumption at 10 watts? At what speed would the processor operate?

## Chapter 4

## Problems

1. [M, None, 4.x] Figure 0.1 shows a clock-distribution network. Each segment of the clock network (between the nodes) is 5 mm long, $3 \mu \mathrm{~m}$ wide, and is implemented in polysilicon. At each of the terminal nodes (such as $R$ ) resides a load capacitance of 100 fF .
a. Determine the average current of the clock driver, given a voltage swing on the clock lines of 5 V and a maximum delay of 5 nsec between clock source and destination node $R$. For this part, you may ignore the resistance and inductance of the network
b. Unfortunately the resistance of the polysilicon cannot be ignored. Assume that each straight segment of the network can be modeled as a $\Pi$-network. Draw the equivalent circuit and annotate the values of resistors and capacitors.
c. Determine the dominant time-constant of the clock response at node $R$.


Figure 0.1 Clock-distribution network.
2. [C, SPICE, 4.x] You are designing a clock distribution network in which it is critical to minimize skew between local clocks ( $C L K 1, C L K 2$, and $C L K 3$ ). You have extracted the $R C$ network of Figure 0.2, which models the routing parasitics of your clock line. Initially, you notice that the path to $C L K 3$ is shorter than to $C L K 1$ or $C L K 2$. In order to compensate for this imbalance, you insert a transmission gate in the path of $C L K 3$ to eliminate the skew.
a. Write expressions for the time-constants associated with nodes CLK1, CLK2 and CLK3. Assume the transmission gate can be modeled as a resistance $R_{3}$.
b. If $R_{1}=R_{2}=R_{4}=R_{5}=R$ and $C_{1}=C_{2}=C_{3}=C_{4}=C_{5}=C$, what value of $R_{3}$ is required to balance the delays to $C L K 1, C L K 2$, and $C L K 3$ ?
c. For $R=750 \Omega$ and $C=200 \mathrm{fF}$, what ( $W / L$ )'s are required in the transmission gate to eliminate skew? Determine the value of the propagation delay.
d. Simulate the network using SPICE, and compare the obtained results with the manually obtained numbers.
3. $[\mathrm{M}$, None, 4.x]Consider a CMOS inverter followed by a wire of length $L$. Assume that in the reference design, inverter and wire contribute equally to the total propagation delay $t_{\text {pref }}$. You may assume that the transistors are velocity-saturated. The wire is scaled in line with the ideal wire scaling model. Assume initially that the wire is a local wire.
a. Determine the new (total) propagation delay as a a function of $t_{\text {pref }}$, assuming that technology and supply voltage scale with a factor 2 . Consider only first-order effects.
b. Perform the same analysis, assuming now that the wire scales a global wire, and the wire length scales inversely proportional to the technology.


Figure $0.2 \quad R C$ clock-distribution network.
c. Repeat b , but assume now that the wire is scaled along the constant resistance model. You may ignore the effect of the fringing capacitance.
d. Repeat b , but assume that the new technology uses a better wiring material that reduces the resistivity by half, and a dielectric with a $25 \%$ smaller permittivity.
e. Discuss the energy dissipation of part a. as a function of the energy dissipation of the original design $E_{\text {ref }}$.
f. Determine for each of the statements below if it is true, false, or undefined, and explain in one line your answer.

- When driving a small fan-out, increasing the driver transistor sizes raises the shortcircuit power dissipation.
- Reducing the supply voltage, while keeping the threshold voltage constant decreases the short-circuit power dissipation.
- Moving to Copper wires on a chip will enable us to build faster adders.
- Making a wire wider helps to reduce its RC delay.
- Going to dielectrics with a lower permittivity will make RC wire delay more important.

4. [M, None, 4.x] A two-stage buffer is used to drive a metal wire of 1 cm . The first inverter is of minimum size with an input capacitance $\mathrm{Ci}=10 \mathrm{fF}$ and an internal propagation delay $\mathrm{t}_{\mathrm{p} 0}=50 \mathrm{ps}$ and load dependent delay of $5 \mathrm{ps} / \mathrm{fF}$. The width of the metal wire is $3.6 \mu \mathrm{~m}$. The sheet resistance of the metal is $0.08 \Omega /$, the capacitance value is $0.03 \mathrm{fF} / \mu \mathrm{m} 2$ and the fringing field capacitance is $0.04 \mathrm{fF} / \mu \mathrm{m}$.
a. What is the propagation delay of the metal wire?
b. Compute the optimal size of the second inverter. What is the minimum delay through the buffer?
c. If the input to the first inverter has $25 \%$ chance of making a 0 -to- 1 transition, and the whole chip is running at 20 MHz with a 2.5 supply voltage, then what's the power consumed by the metal wire?
5. [M, None, 4.x]To connect a processor to an external memory an off -chip connection is necessary. The copper wire on the board is 15 cm long and acts as a transmission line with a characteristic impedance of $100 \Omega$ (See Figure 0.3). The memory input pins present a very high impedance which can be considered infinite. The bus driver is a CMOS inverter consisting of very large devices: (50/0.25) for the NMOS and (150/0.25) for the PMOS, where all sizes are
in $\mu \mathrm{m}$. The minimum size device, $(0.25 / 0.25)$ for NMOS and ( $0.75 / 0.25$ ) for PMOS, has the on resistance $35 \mathrm{k} \Omega$.
a. Determine the time it takes for a change in the signal to propagate from source to destination (time of flight). The wire inductance per unit length equals $75 * 10^{-8} \mathrm{H} / \mathrm{m}$.
b. Determine how long it will take the output signal to stay within $10 \%$ of its final value. You can model the driver as a voltage source with the driving device acting as a series resistance. Assume a supply and step voltage of 2.5 V . Hint: draw the lattice diagram for the transmission line.
c. Resize the dimensions of the driver to minimize the total delay.


Figure 0.3 The driver, the connecting copper wire and the memory block being accessed.
6. $[\mathrm{M}$, None, 4. x$]$ A two stage buffer is used to drive a metal wire of 1 cm . The first inverter is a minimum size with an input capacitance $\mathrm{C}_{\mathrm{i}}=10 \mathrm{fF}$ and a propagation delay $\mathrm{t}_{\mathrm{p} 0}=175 \mathrm{ps}$ when loaded with an identical gate. The width of the metal wire is $3.6 \mu \mathrm{~m}$. The sheet resistance of the metal is $0.08 \Omega /$, the capacitance value is $0.03 \mathrm{fF} / \mu \mathrm{m} 2$ and the fringing field capacitance is $0.04 \mathrm{fF} / \mu \mathrm{m}$.
a. What is the propagation delay of the metal wire?
b. Compute the optimal size of the second inverter. What is the minimum delay through the buffer?
7. [M, None, 4.x] For the RC tree given in Figure 0.4 calculate the Elmore delay from node A to node B using the values for the resistors and capacitors given in the below in Table 0.1.


Figure 0.4 RC tree for calculating the delay

Table 0.1 Values of the components in the RC tree of Figure 0.4

| Resistor | Value( $\Omega)$ | Capacitor | Value(fF) |
| :---: | :---: | :---: | :---: |
| R1 | 0.25 | C 1 | 250 |
| R2 | 0.25 | C 2 | 750 |
| R3 | 0.50 | C 3 | 250 |
| R4 | 100 | C 4 | 250 |
| R5 | 0.25 | C 5 | 1000 |
| R6 | 1.00 | C 6 | 250 |
| R7 | 0.75 | C 7 | 500 |
| R8 | 1000 | C 8 | 250 |

8. [M, SPICE, 4.x] In this problem the various wire models and their respective accuracies will be studied.
a. Compute the $0 \%-50 \%$ delay of a $500 \mathrm{um} \times 0.5 \mathrm{um}$ wire with resistance of $0.08 \Omega /$, with area capacitance of $30 \mathrm{aF} / \mathrm{um} 2$, and fringing capacitance of $40 \mathrm{aF} / \mathrm{um}$. Assume the driver has a $100 \Omega$ resistance and negligible output capacitance.

- Using a lumped model for the wire.
- Using a PI model for the wire, and the Elmore equations to find tau. (see Chapter 4, figure 4.26).
- Using the distributed RC line equations from Chapter 4, section 4.4.4.
b. Compare your results in part a. using spice (be sure to include the source resistance). For each simulation, measure the $0 \%-50 \%$ time for the output
- First, simulate a step input to a lumped R-C circuit.
- Next, simulate a step input to your wire as a PI model.
- Unfortunately, our version of SPICE does not support the distributed RC model as described in your book (Chapter 4, section 4.5.1). Instead, simulate a step input to your wire using a PI3 distributed RC model.

9. [M, None, 4.x] A standard CMOS inverter drives an aluminum wire on the first metal layer. Assume $\mathrm{Rn}=4 \mathrm{k} \Omega, \mathrm{Rp}=6 \mathrm{k} \Omega$. Also, assume that the output capacitance of the inverter is negligible in comparison with the wire capacitance. The wire is .5um wide, and the resistivity is $0.08 \Omega /$..
a. What is the "critical length" of the wire?
b. What is the equivalent capacitance of a wire of this length? (For your capacitance calculations, use Table 4.2 of your book, assume there's field oxide underneath and nothing above the aluminum wire)
10. [M, None, 4.x] A 10 cm long lossless transmission line on a PC board (relative dielectric constant $=9$, relative permeability $=1$ ) with characteristic impedance of $50 \Omega$ is driven by a 2.5 V pulse coming from a source with $150 \Omega$ resistance.
a. If the load resistance is infinite, determine the time it takes for a change at the source to reach the load (time of flight).

Now a $200 \Omega$ load is attached at the end of the transmission line.
b. What is the voltage at the load at $\mathrm{t}=3 \mathrm{~ns}$ ?
c. Draw lattice diagram and sketch the voltage at the load as a function of time. Determine how long does it take for the output to be within 1 percent of its final value.
11. [C, SPICE, 4. x$]$ Assume $\mathrm{V}_{\mathrm{DD}}=1.5 \mathrm{~V}$. Also, use short-channel transistor models forhand analysis.

a. The Figure 0.5 shows an output driver feeding a 0.2 pF effective fan-out of CMOS gates through a transmission line. Size the two transistors of the driver to optimize the delay. Sketch waveforms of $\mathrm{V}_{\mathrm{S}}$ and $\mathrm{V}_{\mathrm{L}}$, assuming a square wave input. Label critical voltages and times.
b. Size down the transistors by $m$ times ( m is to be treated as a parameter). Derive a first order expression for the time it takes for $\mathrm{V}_{\mathrm{L}}$ to settle down within $10 \%$ of its final voltage level.Compare the obtained result with the case where no inductance is associated with the wire.Please draw the waveforms of $\mathrm{V}_{\mathrm{L}}$ for both cases, and comment.
c. Use the transistors as in part a). Suppose $\mathrm{C}_{\mathrm{L}}$ is changed to 20 pF . Sketch waveforms of $\mathrm{V}_{\mathrm{S}}$ and $\mathrm{V}_{\mathrm{L}}$, assuming a square wave input. Label critical voltages and instants.
d. Assume now that the transmission line is lossy. Perform Hspice simulation for three cases: $\mathrm{R}=100 \Omega / \mathrm{cm} ; \mathrm{R}=2.5 \Omega / \mathrm{cm} ; \mathrm{R}=0.5 \Omega / \mathrm{cm}$. Get the waveforms of $\mathrm{V}_{\mathrm{S}}, \mathrm{V}_{\mathrm{L}}$ and the middle point of the line. Discuss the results.
12. $[M$, None, 4. $x]$ Consider an isolated 2 mm long and $1 \mu \mathrm{~m}$ wide M 1 (Metal1)wire over a silicon substrate driven by an inverter that has zero resistance and parasitic output capccitance. How will the wire delay change for the following cases? Explain your reasoning in each case.
a. If the wire width is doubled.
b. If the wire length is halved.
c. If the wire thickness is doubled.
d. If thickness of the oxide between the M1 and the substrate is doubled.
13. [E, None, 4.x] In an ideal scaling model, where all dimensions and voltages scale with a factor of $\mathrm{S}>1$ :
a. How does the delay of an inverter scale?
b. If a chip is scaled from one technology to another where all wire dimensions, including the vertical one and spacing, scale with a factor of $S$, how does the wire delayscale? How does the overall operating frequency of a chip scale?
c. Repeat b) for the case where everything scales, except the vertical dimension of wires (it stays constant).

## CHAPTER



## THE CMOS INVERTER

Quantification of integrity, performance, and energy metrics of an inverter Optimization of an inverter design
5.1 Exercises and Design Problems
5.2 The Static CMOS Inverter - An Intuitive Perspective
5.3 Evaluating the Robustness of the CMOS Inverter: The Static Behavior
5.3.1 Switching Threshold
5.3.2 Noise Margins
5.3.3 Robustness Revisited
5.4 Performance of CMOS Inverter: The Dynamic Behavior
5.4.1 Computing the Capacitances
5.4.2 Propagation Delay: First-Order Analysis
5.4.3 Propagation Delay from a Design Perspective
5.5 Power, Energy, and Energy-Delay
5.5.1 Dynamic Power Consumption
5.5.2 Static Consumption
5.5.3 Putting It All Together
5.5.4 Analyzing Power Consumption Using SPICE
5.6 Perspective: Technology Scaling and its Impact on the Inverter Metrics

### 5.1 Exercises and Design Problems

1. [M, SPICE, 3.3.2] The layout of a static CMOS inverter is given in Figure 5.1. $(\lambda=0.125$ $\mu \mathrm{m}$ ).
a. Determine the sizes of the NMOS and PMOS transistors.
b. Plot the VTC (using HSPICE) and derive its parameters ( $V_{O H}, V_{O L}, V_{M}, V_{I H}$, and $V_{I L}$ ).
c. Is the VTC affected when the output of the gates is connected to the inputs of 4 similar gates?


Figure 5.1 CMOS inverter layout.
d. Resize the inverter to achieve a switching threshold of approximately 0.75 V . Do not layout the new inverter, use HSPICE for your simulations. How are the noise margins affected by this modification?
2. Figure 5.2 shows a piecewise linear approximation for the VTC. The transition region is approximated by a straight line with a slope equal to the inverter gain at $V_{M}$. The intersection of this line with the $V_{O H}$ and the $V_{O L}$ lines defines $V_{I H}$ and $V_{I L}$.
a. The noise margins of a CMOS inverter are highly dependent on the sizing ratio, $r=k_{p} / k_{n}$, of the NMOS and PMOS transistors. Use HSPICE with $V_{T_{n}}=\left|V_{T_{p}}\right|$ to determine the value of $r$ that results in equal noise margins? Give a qualitative explanation.
b. Section 5.3.2 of the text uses this piecewise linear approximation to derive simplified expressions for $N M_{H}$ and $N M_{L}$ in terms of the inverter gain. The derivation of the gain is based on the assumption that both the NMOS and the PMOS devices are velocity saturated at $V_{M}$. For what range of $r$ is this assumption valid? What is the resulting range of $V_{M}$ ?
c. Derive expressions for the inverter gain at $V_{M}$ for the cases when the sizing ratio is just above and just below the limits of the range where both devices are velocity saturated. What are the operating regions of the NMOS and the PMOS for each case? Consider the effect of channel-length modulation by using the following expression for the small-signal resistance in the saturation region: $r_{o, s a t}=1 /\left(\lambda I_{D}\right)$.


Figure 5.2 A different approach to derive $V_{I L}$ and $V_{I H}$.
3. [M, SPICE, 3.3.2] Figure 5.3 shows an NMOS inverter with resistive load.
a. Qualitatively discuss why this circuit behaves as an inverter.
b. Find $V_{O H}$ and $V_{O L}$ calculate $V_{I H}$ and $V_{I L}$.
c. Find $N M_{L}$ and $N M_{H}$, and plot the VTC using HSPICE.
d. Compute the average power dissipation for: (i) $V_{i n}=0 \mathrm{~V}$ and (ii) $V_{i n}=2.5 \mathrm{~V}$


Figure 5.3 Resistive-load inverter
e. Use HSPICE to sketch the VTCs for $R_{L}=37 \mathrm{k}, 75 \mathrm{k}$, and 150 k on a single graph.
f. Comment on the relationship between the critical VTC voltages (i.e., $V_{O L}, V_{O H}, V_{I L}, V_{I H}$ ) and the load resistance, $R_{L}$.
g. Do high or low impedance loads seem to produce more ideal inverter characteristics?
4. [E, None, 3.3.3] For the inverter of Figure 5.3 and an output load of 3 pF :
a. Calculate $t_{p l h}, t_{p h l}$, and $t_{p}$.
b. Are the rising and falling delays equal? Why or why not?
c. Compute the static and dynamic power dissipation assuming the gate is clocked as fast as possible.
5. The next figure shows two implementations of MOS inverters. The first inverter uses only NMOS transistors.
a. Calculate $\mathrm{V}_{\mathrm{OH}}, \mathrm{V}_{\mathrm{OL}}, \mathrm{V}_{\mathrm{M}}$ for each case.

b. Use HSPICE to obtain the two VTCs. You must assume certain values for the source/drain areas and perimeters since there is no layout. For our scalable CMOS process, $\lambda=0.125$ $\mu \mathrm{m}$, and the source/drain extensions are $5 \lambda$ for the PMOS; for the NMOS the source/drain contact regions are $5 \lambda \times 5 \lambda$.
c. Find $\mathrm{V}_{\mathrm{IH}}, \mathrm{V}_{\mathrm{IL}}, N M_{L}$ and $N M_{H}$ for each inverter and comment on the results. How can you increase the noise margins and reduce the undefined region?
d. Comment on the differences in the VTCs, robustness and regeneration of each inverter.
6. Consider the following NMOS inverter. Assume that the bulk terminals of all NMOS device are connected to GND. Assume that the input IN has a 0 V to 2.5 V swing.

a. Set up the equation(s) to compute the voltage on node $x$. Assume $\gamma=0.5$.
b. What are the modes of operation of device M2? Assume $\gamma=0$.
c. What is the value on the output node $O U T$ for the case when $I N=0 \mathrm{~V}$ ?Assume $\gamma=0$.
d. Assuming $\gamma=0$, derive an expression for the switching threshold $\left(\mathrm{V}_{\mathrm{M}}\right)$ of the inverter. Recall that the switching threshold is the point where $V_{I N}=V_{\text {OUT }}$. Assume that the device sizes for M1, M2 and M3 are $(\mathrm{W} / \mathrm{L})_{1},(\mathrm{~W} / \mathrm{L})_{2}$, and $(\mathrm{W} / \mathrm{L})_{3}$ respectively. What are the limits on the switching threshold?

For this, consider two cases:
i) $(\mathrm{W} / \mathrm{L})_{1} \gg(\mathrm{~W} / \mathrm{L})_{2}$

$$
\text { ii) }(\mathrm{W} / \mathrm{L})_{2} \gg(\mathrm{~W} / \mathrm{L})_{1}
$$

7. Consider the circuit in Figure 5.5. Device M1 is a standard NMOS device. Device M2 has all the same properties as M1, except that its device threshold voltage is negative and has a value of -0.4 V . Assume that all the current equations and inequality equations (to determine the mode of operation) for the depletion device M2 are the same as a regular NMOS. Assume that the input $I N$ has a 0 V to 2.5 V swing.


Figure 5.5 A depletion load NMOS inverter
a. Device M2 has its gate terminal connected to its source terminal. If $V_{I N}=0 \mathrm{~V}$, what is the output voltage? In steady state, what is the mode of operation of device M2 for this input?
b. Compute the output voltage for $V_{I N}=2.5 \mathrm{~V}$. You may assume that $\boldsymbol{V}_{\text {OUT }}$ is small to simplify your calculation. In steady state, what is the mode of operation of device M2 for this input?
c. Assuming $\operatorname{Pr}_{(I N=0)}=0.3$, what is the static power dissipation of this circuit?
8. [M, None, 3.3.3] An NMOS transistor is used to charge a large capacitor, as shown in Figure 5.6.
a. Determine the $t_{p L H}$ of this circuit, assuming an ideal step from 0 to 2.5 V at the input node.
b. Assume that a resistor $R_{S}$ of $5 \mathrm{k} \Omega$ is used to discharge the capacitance to ground. Determine $t_{p H L}$.
c. Determine how much energy is taken from the supply during the charging of the capacitor. How much of this is dissipated in M1. How much is dissipated in the pull-down resistance during discharge? How does this change when $R_{S}$ is reduced to $1 \mathrm{k} \Omega$.
d. The NMOS transistor is replaced by a PMOS device, sized so that $k_{p}$ is equal to the $k_{n}$ of the original NMOS. Will the resulting structure be faster? Explain why or why not.


Figure 5.6 Circuit diagram with annotated $W / L$ ratios
9. The circuit in Figure 5.7 is known as the source follower configuration. It achieves a DC level shift between the input and the output. The value of this shift is determined by the current $\mathrm{I}_{0}$. Assume $x_{d}=0, \gamma=0.4,2\left|\phi_{\mathrm{f}}\right|=0.6 \mathrm{~V}, V_{T 0}=0.43 \mathrm{~V}, k_{n}{ }^{\prime}=115 \mu \mathrm{~A} / \mathrm{V}^{2}$ and $\lambda=0$.


Figure 5.7 NMOS source follower configuration
a. Suppose we want the nominal level shift between $\mathrm{V}_{\mathrm{i}}$ and $\mathrm{V}_{\mathrm{o}}$ to be 0.6 V in the circuit in Figure 5.7 (a). Neglecting the backgate effect, calculate the width of M2 to provide this level shift (Hint: first relate $V_{i}$ to $V_{o}$ in terms of $I_{o}$ ).
b. Now assume that an ideal current source replaces M2 (Figure 5.7 (b)). The NMOS transistor M1 experiences a shift in $V_{T}$ due to the backgate effect. Find $V_{T}$ as a function of $V_{0}$ for $\mathrm{V}_{\mathrm{o}}$ ranging from 0 to 2.5 V with 0.5 V intervals. Plot $\mathrm{V}_{\mathrm{T}} \mathrm{vs}$. $\mathrm{V}_{\mathrm{o}}$
c. Plot $\mathrm{V}_{\mathrm{o}}$ vs. $\mathrm{V}_{\mathrm{i}}$ as $\mathrm{V}_{\mathrm{o}}$ varies from 0 to 2.5 V with 0.5 V intervals. Plot two curves: one neglecting the body effect and one accounting for it. How does the body effect influence the operation of the level converter?
d. At $\mathrm{V}_{\mathrm{o}}$ (with body effect) $=2.5 \mathrm{~V}$, find $\mathrm{V}_{\mathrm{o}}$ (ideal) and thus determine the maximum error introduced by the body effect.
10. For this problem assume:
$V_{D D}=2.5 \mathrm{~V}, W_{P} / L=1.25 / 0.25, W_{N} / L=0.375 / 0.25, L=L_{\text {eff }}=0.25 \mu \mathrm{~m}$ (i.e. $x_{d}=0 \mu \mathrm{~m}$ ), $C_{L}=C_{\text {inv }}$. gate, $k_{n}{ }^{\prime}=115 \mu \mathrm{~A} / \mathrm{V}^{2}, k_{p}{ }^{\prime}=-30 \mu \mathrm{~A} / \mathrm{V}^{2}, V_{t n 0}=\left|V_{t p 0}\right|=0.4 \mathrm{~V}, \lambda=0 \mathrm{~V}^{-1}, \gamma=0.4,2\left|\phi_{f}\right|=0.6 \mathrm{~V}$, and $t_{o x}$ $=58 \mathrm{~A}$. Use the HSPICE model parameters for parasitic capacitance given below (i.e. $C_{g d 0}, C_{j}$, $C_{j s w}$ ), and assume that $V_{S B}=0 \mathrm{~V}$ for all problems except part (e).


Figure 5.8 CMOS inverter with capacitive

```
\#\# Parasitic Capacitance Parameters ( \(\mathrm{F} / \mathrm{m}\) )\#\#
NMOS: CGDO \(=3.11 \times 10^{-10}, \mathrm{CGSO}=3.11 \times 10^{-10}, \mathrm{CJ}=2.02 \times 10^{-3}, \mathrm{CJSW}=2.75 \times 10^{-10}\)
PMOS: \(\mathrm{CGDO}=2.68 \times 10^{-10}, \mathrm{CGSO}=2.68 \times 10^{-10}, \mathrm{CJ}=1.93 \times 10^{-3}, \mathrm{CJSW}=2.23 \times 10^{-10}\)
```

a. What is the $V_{m}$ for this inverter?
b. What is the effective load capacitance $C_{\text {Leff }}$ of this inverter? (include parasitic capacitance, refer to the text for $K_{e q}$ and $m$.) Hint: You must assume certain values for the source/drain areas and perimeters since there is no layout. For our scalable CMOS process, $\lambda=0.125$ $\mu \mathrm{m}$, and the source/drain extensions are $5 \lambda$ for the PMOS; for the NMOS the source/drain contact regions are $5 \lambda \times 5 \lambda$.
c. Calculate $t_{P H L}, t_{P L H}$ assuming the result of (b) is ' $C_{L e f f}=6.5 \mathrm{fF}$ '. (Assume an ideal step input, i.e. $t_{\text {rise }}=t_{\text {fall }}=0$. Do this part by computing the average current used to charge/discharge $C_{\text {Leff }}$ )
d. Find $\left(\mathrm{W}_{\mathrm{p}} / \mathrm{W}_{\mathrm{n}}\right)$ such that $t_{P H L}=t_{P L H}$.
e. Suppose we increase the width of the transistors to reduce the $t_{P H L}, t_{P L H}$. Do we get a proportional decrease in the delay times? Justify your answer.
f. Suppose $V_{S B}=1 \mathrm{~V}$, what is the value of $V_{t n}, V_{t p}, V_{m}$ ? How does this qualitatively affect $C_{\text {Leff }}$ ?
11. Using Hspice answer the following questions.
a. Simulate the circuit in Problem 10 and measure $t_{P}$ and the average power for input $V_{i n}$ : pulse( $0 V_{D D} 5 \mathrm{n} 0.1 \mathrm{n} 0.1 \mathrm{n} 9 \mathrm{n} 20 \mathrm{n}$ ), as $V_{D D}$ varies from $1 \mathrm{~V}-2.5 \mathrm{~V}$ with a 0.25 V interval. $t_{P}$ $\left.=\left(t_{P H L}+t_{P L H}\right) / 2\right]$. Using this data, plot ' $t_{P}$ vs. $V_{D D}$ ', and 'Power vs. $V_{D D}$ '.

Specify AS, AD, PS, PD in your spice deck, and manually add $C_{L}=6.5 \mathrm{fF}$. Set $V_{S B}=0 \mathrm{~V}$ for this problem.
b. For Vdd equal to 2.5 V determine the maximum fan-out of identical inverters this gate can drive before its delay becomes larger than 2 ns .
c. Simulate the same circuit for a set of 'pulse' inputs with rise and fall times of $t_{\text {in_risef.fall }}$ $=1 \mathrm{~ns}, 2 \mathrm{~ns}, 5 \mathrm{~ns}, 10 \mathrm{~ns}, 20 \mathrm{~ns}$. For each input, measure (1) the rise and fall times $t_{\text {out_rise }}$ and
$t_{\text {out fall }}$ of the inverter output, (2) the total energy lost $E_{\text {totala }}$, and (3) the energy lost due to short circuit current $E_{\text {short }}$
 $t_{\text {in_rise,fall }}$, (3) $E_{\text {short }}$ Vs. $t_{\text {in_r_isefall }}$ and (4) $E_{\text {short }} / E_{\text {total }}$ Vs. $t_{\text {in_r_isefall. }}$
d. Provide simple explanations for:
(i) Why the slope for (1) is less than 1?
(ii) Why $E_{\text {short }}$ increases with $t_{\text {in_rise fall }}$ ?
(iii) Why $E_{\text {total }}$ increases with $t_{\text {in_risefall }}$ ?
12. Consider the low swing driver of Figure 5.9:

a. What is the voltage swing on the output node $\left(\mathrm{V}_{\text {out }}\right)$ ? Assume $\gamma=0$.
b. Estimate (i) the energy drawn from the supply and (ii) energy dissipated for a 0 V to 2.5 V transition at the input. Assume that the rise and fall times at the input are 0 . Repeat the analysis for a 2.5 V to 0 V transition at the input.
c. Compute $\mathrm{t}_{\mathrm{pLH}}$ (i.e. the time to transition from $\mathrm{V}_{\mathrm{OL}}$ to $\left.\left(\mathrm{V}_{\mathrm{OH}}+\mathrm{V}_{\mathrm{OL}}\right) / 2\right)$. Assume the input rise time to be $0 . \mathrm{V}_{\mathrm{OL}}$ is the output voltage with the input at 0 V and $\mathrm{V}_{\mathrm{OH}}$ is the output voltage with the input at 2.5 V .
d. Compute $\mathrm{V}_{\mathrm{OH}}$ taking into account body effect. Assume $\gamma=0.5 \mathrm{~V}^{1 / 2}$ for both NMOS and PMOS.
13. Consider the following low swing driver consisting of NMOS devices M1 and M2. Assume an NWELL implementation. Assume that the inputs IN and $\overline{\mathrm{IN}}$ have a 0 V to 2.5 V swing and that $V_{\text {IN }}=0 \mathrm{~V}$ when $\mathrm{V}_{\overline{\mathrm{IN}}}=2.5 \mathrm{~V}$ and vice-versa. Also assume that there is no skew between IN and $\overline{\mathrm{IN}}$ (i.e., the inverter delay to derive $\overline{\mathrm{IN}}$ from IN is zero).

a. What voltage is the bulk terminal of M2 connected to?
b. What is the voltage swing on the output node as the inputs swing from 0 V to 2.5 V . Show the low value and the high value.
c. Assume that the inputs IN and $\overline{\mathrm{IN}}$ have zero rise and fall times. Assume a zero skew between IN and IN. Determine the low to high propagation delay for charging the output node measured from the the $50 \%$ point of the input to the $50 \%$ point of the output. Assume that the total load capacitance is 1 pF , including the transistor parasitics.
d. Assume that, instead of the 1 pF load, the low swing driver drives a non-linear capacitor, whose capacitance vs. voltage is plotted below. Compute the energy drawn from the low supply for charging up the load capacitor. Ignore the parasitic capacitance of the driver circuit itself.

14. The inverter below operates with $\mathrm{V}_{\mathrm{DD}}=0.4 \mathrm{~V}$ and is composed of $|\mathrm{Vt}|=0.5 \mathrm{~V}$ devices. The devices have identical $\mathrm{I}_{0}$ and n .
a. Calculate the switching threshold $\left(\mathrm{V}_{\mathrm{M}}\right)$ of this inverter.
b. Calculate $\mathrm{V}_{\mathrm{IL}}$ and $\mathrm{V}_{\mathrm{IH}}$ of the inverter.


Figure 5.11 Inverter in Weak Inversion Regime
15. Sizing a chain of inverters.
a. In order to drive a large capacitance $\left(\mathrm{C}_{\mathrm{L}}=20 \mathrm{pF}\right)$ from a minimum size gate (with input capacitance $\mathrm{C}_{\mathrm{i}}=10 \mathrm{fF}$ ), you decide to introduce a two-staged buffer as shown in Figure 5.12. Assume that the propagation delay of a minimum size inverter is 70 ps . Also assume
that the input capacitance of a gate is proportional to its size. Determine the sizing of the two additional buffer stages that will minimize the propagation delay.


Figure 5.12 Buffer insertion for driving large loads.
b. If you could add any number of stages to achieve the minimum delay, how many stages would you insert? What is the propagation delay in this case?
c. Describe the advantages and disadvantages of the methods shown in (a) and (b).
d. Determine a closed form expression for the power consumption in the circuit. Consider only gate capacitances in your analysis. What is the power consumption for a supply voltage of 2.5 V and an activity factor of 1 ?
16. [M, None, 3.3.5] Consider scaling a CMOS technology by $S>1$. In order to maintain compatibility with existing system components, you decide to use constant voltage scaling.
a. In traditional constant voltage scaling, transistor widths scale inversely with $\mathrm{S}, \mathrm{W} \propto 1 / \mathrm{S}$. To avoid the power increases associated with constant voltage scaling, however, you decide to change the scaling factor for $W$. What should this new scaling factor be to maintain approximately constant power. Assume long-channel devices (i.e., neglect velocity saturation).
b. How does delay scale under this new methodology?
c. Assuming short-channel devices (i.e., velocity saturation), how would transistor widths have to scale to maintain the constant power requirement?

## DESIGN PROBLEM

Using the $0.25 \mu \mathrm{~m}$ CMOS introduced in Chapter 2, design a static CMOS inverter that meets the following requirements:

1. Matched pull-up and pull-down times (i.e., $t_{p H L}=t_{p L H}$ ).
2. $t_{p}=5 \mathrm{nsec}( \pm 0.1 \mathrm{nsec})$.

The load capacitance connected to the output is equal to 4 pF . Notice that this capacitance is substantially larger than the internal capacitances of the gate.

Determine the $W$ and $L$ of the transistors. To reduce the parasitics, use minimal lengths ( $L=0.25 \mu \mathrm{~m}$ ) for all transistors. Verify and optimize the design using SPICE after proposing a first design using manual computations. Compute also the energy consumed per transition. If you have a layout editor (such as MAGIC) available, perform the physical design, extract the real circuit parameters, and compare the simulated results with the ones obtained earlier.

## Chapter 6 <br> PROBLEMS

1. [E, None, 4.2] Implement the equation $X=((\bar{A}+\bar{B})(\bar{C}+\bar{D}+\bar{E})+\bar{F}) \bar{G}$ using complementary CMOS. Size the devices so that the output resistance is the same as that of an inverter with an NMOS $W / L=2$ and PMOS $W / L=6$. Which input pattern(s) would give the worst and best equivalent pull-up or pull-down resistance?
2. Implement the following expression in a full static CMOS logic fashion using no more than 10 transistors:

$$
\bar{Y}=(A \cdot B)+(A \cdot C \cdot E)+(D \cdot E)+(D \cdot C \cdot B)
$$

3. Consider the circuit of Figure 6.1.


Figure 6.1 CMOS combinational logic gate.
a. What is the logic function implemented by the CMOS transistor network? Size the NMOS and PMOS devices so that the output resistance is the same as that of an inverter with an NMOS $W / L=4$ and PMOS $W / L=8$.
b. What are the input patterns that give the worst case $t_{p H L}$ and $t_{p L H}$. State clearly what are the initial input patterns and which input(s) has to make a transition in order to achieve this maximum propagation delay. Consider the effect of the capacitances at the internal nodes.
c. Verify part (b) with SPICE. Assume all transistors have minimum gate length $(0.25 \mu \mathrm{~m})$.
d. If $\mathrm{P}(\mathrm{A}=1)=0.5, \mathrm{P}(\mathrm{B}=1)=0.2, \mathrm{P}(\mathrm{C}=1)=0.3$ and $\mathrm{P}(\mathrm{D}=1)=1$, determine the power dissipation in the logic gate. Assume $V_{D D}=2.5 \mathrm{~V}, C_{\text {out }}=30 \mathrm{fF}$ and $f_{c l k}=250 \mathrm{MHz}$.
4. [M, None, 4.2] CMOS Logic
a. Do the following two circuits (Figure 6.2) implement the same logic function? If yes, what is that logic function? If no, give Boolean expressions for both circuits.
b. Will these two circuits' output resistances always be equal to each other?
c. Will these two circuits' rise and fall times always be equal to each other? Why or why not?


Circuit A


Circuit B

Figure 6.2 Two static CMOS gates.
5. [E, None, 4.2] The transistors in the circuits of the preceding problem have been sized to give an output resistance of $13 \mathrm{k} \Omega$ for the worst-case input pattern. This output resistance can vary, however, if other patterns are applied.
a. What input patterns $(A-E)$ give the lowest output resistance when the output is low? What is the value of that resistance?
b. What input patterns $(A-E)$ give the lowest output resistance when the output is high? What is the value of that resistance?
6. [E, None, 4.2] What is the logic function of circuits A and B in Figure 6.3? Which one is a dual network and which one is not? Is the nondual network still a valid static logic gate? Explain. List any advantages of one configuration over the other.



Figure 6.3 Two logic functions.
7. [E, None, 4.2] Compute the following for the pseudo-NMOS inverter shown in Figure 6.4:
a. $V_{O L}$ and $V_{O H}$
b. $N M_{L}$ and $N M_{H}$
c. The power dissipation: (1) for $V_{\text {in }}$ low, and (2) for $V_{\text {in }}$ high
d. For an output load of 1 pF , calculate $t_{p L H}, t_{p H L}$, and $t_{p}$. Are the rising and falling delays equal? Why or why not?
8. [M, SPICE, 4.2] Consider the circuit of Figure 6.5.

a. What is the output voltage if only one input is high? If all four inputs are high?
b. What is the average static power consumption if, at any time, each input turns on with an (independent) probability of $0.5 ? 0.1$ ?
c. Compare your analytically obtained results to a SPICE simulation.


Figure 6.5 Pseudo-NMOS gate.
9. [M, None, 4.2] Implement $F=A \overline{B C}+\bar{A} C D$ (and $\bar{F}$ ) in DCVSL. Assume $A, B, C, D$, and their complements are available as inputs. Use the minimum number of transistors.
10. [E, Layout, 4.2] A complex logic gate is shown in Figure 6.6.
a. Write the Boolean equations for outputs $F$ and $G$. What function does this circuit implement?
b. What logic family does this circuit belong to?
c. Assuming $W / L=0.5 \mathrm{u} / 0.25 \mathrm{u}$ for all nmos transistors and $W / L=2 \mathrm{u} / 0.25 \mathrm{u}$ for the pmos transistors, produce a layout of the gate using Magic. Your layout should conform to the following datapath style: (1) Inputs should enter the layout from the left in polysilicon; (2) The outputs should exit the layout at the right in polysilicon (since the outputs would probably be driving transistor gate inputs of the next cell to the right); (3) Power and ground lines should run vertically in metal 1.
d. Extract and netlist the layout. Load both outputs (F,G) with a 30 fF capacitance and simulate the circuit. Does the gate function properly? If not, explain why and resize the transistors so that it does. Change the sizes (and areas and perimeters) in the HSPICE netlist.


Figure 6.6 Two-input complex logic gate.
11. Design and simulate a circuit that generates an optimal differential signal as shown in Figure 6.7. Make sure the rise and fall times are equal.


Figure 6.7 Differential Buffer.
12. What is the function of the circuit in Figure 6.8?

13. Implement the function $S=A B C+A \overline{B C}+\overline{A B} C+\bar{A} B \bar{C}$, which gives the sum of two inputs with a carry bit, using NMOS pass transistor logic. Design a DCVSL gate which implements the same function. Assume $A, B, C$, and their complements are available as inputs.
14. Describe the logic function computed by the circuit in Figure 6.9. Note that all transistors (except for the middle inverters) are NMOS. Size and simulate the circuit so that it achieves a

100 ps delay ( $50-50$ ) using $0.25 \mu \mathrm{~m}$ devices, while driving a 100 fF load on both differential outputs. ( $V_{D D}=2.5 \mathrm{~V}$ ) Assume $A, B$ and their complements are available as inputs.


Figure 6.9 Cascoded Logic Styles.
For the drain and source perimeters and areas you can use the following approximations: $\mathrm{AS}=\mathrm{AD}=\mathrm{W}^{*} 0.625 \mathrm{u}$ and $\mathrm{PS}=\mathrm{PD}=\mathrm{W}+1.25 \mathrm{u}$.
15. [M, None. 4.2] Figure 6.10 contains a pass-gate logic network.
a. Determine the truth table for the circuit. What logic function does it implement?
b. Assuming 0 and 2.5 V inputs, size the PMOS transistor to achieve a $V_{O L}=0.3 \mathrm{~V}$.
c. If the PMOS were removed, would the circuit still function correctly? Does the PMOS transistor serve any useful purpose?


Figure 6.10 Pass-gate network.
16. [M, None, 4.2] This problem considers the effects of process scaling on pass-gate logic.
a. If a process has a $t_{b u f}$ of $0.4 \mathrm{~ns}, R_{e q}$ of $8 \mathrm{k} \Omega$, and $C$ of 12 fF , what is the optimal number of stages between buffers in a pass-gate chain?
b. Suppose that, if the dimension of this process are shrunk by a factor $S, R_{e q}$ scales as $1 / S^{2}, C$ scales as $1 / S$, and $t_{b u f}$ scales as $1 / S^{2}$. What is the expression for the optimal number of buffers as a function of $S$ ? What is this value if $S=2$ ?
17. [C, None, 4.2] Consider the circuit of Figure 6.11. Let $C_{x}=50 \mathrm{fF}, M_{r}$ has $W / L=0.375 / 0.375$, $M_{n}$ has $W / L_{\text {eff }}=0.375 / 0.25$. Assume the output inverter doesn't switch until its input equals $V_{D D} / 2$.
a. How long will it take $M_{n}$ to pull down node $x$ from 2.5 V to 1.25 V if $I n$ is at 0 V and $B$ is at 2.5 V ?
b. How long will it take $M_{n}$ to pull up node $x$ from 0 V to 1.25 V if $V_{I n}$ is 2.5 V and $V_{B}$ is 2.5 V ?
c. What is the minimum value of $V_{B}$ necessary to pull down $V_{x}$ to 1.25 V when $V_{I n}=0 \mathrm{~V}$ ?


Figure 6.11 Level restorer.
18. Pass Transistor Logic


$$
\begin{aligned}
& \mathrm{V}_{\mathrm{DD}}=2.5 \mathrm{~V} \\
& (\mathrm{~W} / \mathrm{L})_{2}=1.5 \mathrm{um} / 0.25 \mathrm{um} \\
& (\mathrm{~W} / \mathrm{L})_{1}=0.5 \mathrm{um} / 0.25 \mathrm{um} \\
& (\mathrm{~W} / \mathrm{L})_{\mathrm{ni}}=0.5 \mathrm{um} / 0.25 \mathrm{um} \\
& \mathrm{k}_{\mathrm{n}}^{\prime}=115 \mathrm{uA} / \mathrm{V}^{2}, \mathrm{kp}^{\prime}=-30 \mathrm{uA} / \mathrm{V}^{2} \\
& \mathrm{~V}_{\mathrm{tN}}=0.43 \mathrm{~V}, \mathrm{~V}_{\mathrm{tP}}=-0.4 \mathrm{~V}
\end{aligned}
$$

Figure 6.12 Level restoring circuit.
Consider the circuit of Figure 6.12. Assume the inverter switches ideally at $\mathrm{V}_{\mathrm{DD}} / 2$, neglect body effect, channel length modulation and all parasitic capacitance throughout this problem.
a. What is the logic function performed by this circuit?
b. Explain why this circuit has non-zero static dissipation.
c. Using only just 1 transistor, design a fix so that there will not be any static power dissipation. Explain how you chose the size of the transistor.
d. Implement the same circuit using transmission gates.
e. Replace the pass-transistor network in Figure 6.12 with a pass transistor network that computes the following function: $x=A B C$ at the node $x$. Assume you have the true and complementary versions of the three inputs $\mathrm{A}, \mathrm{B}$ and C .
19. [M, None, 4.3] Sketch the waveforms at $x, y$, and $z$ for the given inputs (Figure 6.13). You may approximate the time scale, but be sure to compute the voltage levels. Assume that $V_{T}=0.5 \mathrm{~V}$ when body effect is a factor.
20. [E, None, 4.3] Consider the circuit of Figure 6.14.
a. Give the logic function of $x$ and $y$ in terms of $A, B$, and $C$. Sketch the waveforms at $x$ and $y$ for the given inputs. Do $x$ and $y$ evaluate to the values you expected from their logic functions? Explain.
b. Redesign the gates using $n p$-CMOS to eliminate any race conditions. Sketch the waveforms at $x$ and $y$ for your new circuit.
21. [M, None, 4.3] Suppose we wish to implement the two logic functions given by $F=A+B+C$ and $G=A+B+C+D$. Assume both true and complementary signals are available.

a. Implement these functions in dynamic CMOS as cascaded $\phi$ stages so as to minimize the total transistor count.
b. Design an $n p$-CMOS implementation of the same logic functions. Does this design display any of the difficulties of part (a)?
22. Consider a conventional 4-stage Domino logic circuit as shown in Figure 6.15 in which all precharge and evaluate devices are clocked using a common clock $\phi$. For this entire problem, assume that the pulldown network is simply a single NMOS device, so that each Domino stage consists of a dynamic inverter followed by a static inverter. Assume that the precharge time, evaluate time, and propagation delay of the static inverter are all $T / 2$. Assume that the transitions are ideal (zero rise/fall times).


Figure 6.15 Conventional DOMINO Dynamic Logic.
a. Complete the timing diagram for signals $\mathrm{Out}_{1}, \mathrm{Out}_{2}, \mathrm{Out}_{3}$ and $\mathrm{Out}_{4}$, when the IN signal goes high before the rising edge of the clock $\phi$. Assume that the clock period is 10 T time units.
b. Suppose that there are no evaluate switches at the 3 latter stages. Assume that the clock $\phi$ is initially in the precharge state ( $\phi=0$ with all nodes settled to the correct precharge states), and the block enters the evaluate period $(\phi=1)$. Is there a problem during the evaluate period, or is there a benefit? Explain.
c. Assume that the clock $\phi$ is initially in the evaluate state $(\phi=1)$, and the block enters the precharge state $(\phi=0)$. Is there a problem, or is there any benefit, if the last three evaluate switches are removed? Explain.
23. [C, Spice, 4.3] Figure 6.16 shows a dynamic CMOS circuit in Domino logic. In determining source and drain areas and perimeters, you may use the following approximations: $A D=A S=$ $W \times 0.625 \mu \mathrm{~m}$ and $P D=P S=W+1.25 \mu \mathrm{~m}$. Assume 0.1 ns rise/fall times for all inputs, including the clock. Furthermore, you may assume that all the inputs and their complements are available, and that all inputs change during the precharge phase of the clock cycle.
a. What Boolean functions are implemented at outputs $F$ and $G$ ? If $A$ and $B$ are interpreted as two-bit binary words, $A=A_{1} A_{0}$ and $B=B_{1} B_{0}$, then what interpretation can be applied to output $G$ ?
b. Which gate ( 1 or 2 ) has the highest potential for harmful charge sharing and why? What sequence of inputs (spanning two clock cycles) results in the worst-case charge-sharing scenario? Using SPICE, determine the extent to which charge sharing affects the circuit for this worst case..


Figure 6.16 DOMINO logic circuit.
24. [M, Spice, 4.3] In this problem you will consider methods for eliminating charge sharing in the circuit of Figure 6.16. You will then determine the performance of the resulting circuit.
a. In problem 24 you determined which gate ( 1 or 2 ) suffers the most from charge sharing. Add a single $2 / 0.25$ PMOS precharge transistor (with its gate driven by the clock $\phi$ and its source connected to $V_{D D}$ ) to one of the nodes in that gate to maximally reduce the chargesharing effect. What effect (if any) will this addition have on the gate delay? Use SPICE to demonstrate that the additional transistor has eliminated charge sharing for the previously determined worst-case sequence of inputs.
b. For the new circuit (including additional precharge transistor), find the sequence of inputs (spanning two clock cycles) that results in the worst-case delay through the circuit.

Remember that precharging is another factor that limits the maximum clocking frequency of the circuit, so your input sequence should address the worst-case precharging delay.
c. Using SPICE on the new circuit and applying the sequence of inputs found in part (b), find the maximum clock frequency for correct operation of the circuit. Remember that the precharge cycle must be long enough to allow all precharged nodes to reach $\sim 90 \%$ of their final values before evaluation begins. Also, recall that the inputs ( $A, B$ and their complements) should not begin changing until the clock signal has reached 0 V (precharge phase), and they should reach their final values before the circuit enters the evaluation phase.
25. [C, None, 4.2-3] For this problem, refer to the layout of Figure 6.17.
a. Draw the schematic corresponding to the layout. Include transistor sizes.
b. What logic function does the circuit implement? To which logic family does the circuit belong?
c. Does the circuit have any advantages over fully complementary CMOS?
d. Calculate the worst-case $V_{O L}$ and $V_{O H}$.
e. Write the expresions for the area and perimeter of the drain and source for all of the FETs in terms of $\lambda$. Assume that the capacitance of shared diffuusions divides evenly between the sharing devices. Copy the layout into Magic, extract and simulate to find the worstcase $t_{\text {pHL }}$ time. For what input transition(s) does this occur? Name all of the parasitic capacitances that you would need to know to calculate this delay by hand (you do not need to perform the calculation).


Figure 6.17 Layout of complex gate.
26. [E, None, 4.4] Derive the truth table, state transition graph, and output transition probabilities for a three-input XOR gate with independent, identically distributed, uniform white-noise inputs.
27. [C, None, 4.4] Figure 6.18 shows a two-input multiplexer. For this problem, assume independent, identically-distributed uniform white noise inputs.
a. Does this schematic contain reconvergent fan-out? Explain your answer.
b. Find the exact signal $\left(P_{1}\right)$ and transition $\left(P_{0 \rightarrow 1}\right)$ formulas for nodes $X, Y$, and $Z$ for: (1) a static, fully complementary CMOS implementation, and (2) a dynamic CMOS implementation.


Figure 6.18 Two-input multiplexer
28. [M, None, 4.4] Compute the switching power consumed by the multiplexer of Figure 6.18, assuming that all significant capacitances have been lumped into the three capacitors shown in the figure, where $C=0.3 \mathrm{pF}$. Assume that $V_{D D}=2.5 \mathrm{~V}$ and independent, identically-distributed uniform white noise inputs, with events occuring at a frequency of 100 MHz . Perform this calculation for the following:
a. A static, fully-complementary CMOS implementation
b. A dynamic CMOS implementation
29. Consider the circuit shown Figure 6.19.
a. What is the logic function implemented by this circuit? Assume that all devices (M1-M6) are $0.5 \mu \mathrm{~m} / 0.25 \mu \mathrm{~m}$.
b. Let the drain current for each device (NMOS and PMOS) be $1 \mu \mathrm{~A}$ for NMOS at $V_{G S}=V_{T}$ and PMOS at $V_{S G}=V_{T}$. What input vectors cause the worst case leakage power for each output value? Explain (state all the vectors, but do not evaluate the leakage). Ignore DIBL.
c. Suppose the circuit is active for a fraction of time $d$ and idle for $(1-d)$. When the circuit is active, the inputs arrive at 100 MHz and are uniformly distributed $\left(\operatorname{Pr}_{(\mathrm{A}=1)}=0.5\right.$, $\operatorname{Pr}_{(\mathrm{B}=1)}=0.5, \operatorname{Pr}_{(\mathrm{C}=1)}=0.5$ ) and independent. When the circuit is in the idle mode, the inputs are fixed to one you chose in part (b). What is the duty cycle $d$ for which the active power is equal to the leakage power?


## DESIGN PROJECT

Design, lay out, and simulate a CMOS four-input XOR gate in the standard 0.25 micron CMOS process. You can choose any logic circuit style, and you are free to choose how many stages of logic to use: you could use one large logic gate or a combination of smaller logic gates. The supply voltage is set at 2.5 V ! Your circuit must drive an external 20 fF load in addition to whatever internal parasitics are present in your circuit.

The primary design objective is to minimize the propagation delay of the worst-case transition for your circuit. The secondary objective is to minimize the area of the layout. At the very worst, your design must have a propagation delay of no more than 0.5 ns and occupy an area of no more than 500 square microns, but the faster and smaller your circuit, the better. Be aware that, when using dynamic logic, the precharge time should be made part of the delay.

The design will be graded on the magnitude of $A \times t_{p}^{2}$, the product of the area of your design and the square of the delay for the worst-case transition.

## Chapter 7 <br> PROBLEMS

1. [M, None, 7.4] Figure 1 shows a practical implementation of a pulse register. Clock Clk is ideal with $50 \%$ duty cycle.


Figure 0.1 Pulse register.

Data : $V_{D D}=2.5 \mathrm{~V}, t_{p, i n v}=200 \mathrm{ps}$, node capacitances are $C_{C l k d}=10 \mathrm{fF}, C_{x}=10 \mathrm{fF}$, both true and complementary outputs node capacitances are 20 fF .
a. Draw the waveforms at nodes $C l k, C l k d, X$ and $Q$ for two clock cycles, with $D=0$ in one cycle and $D=1$ in the other.
b. What is the approximate value of setup and hold times for this circuit?
c. c)If the probability that $D$ will change its logic value in one clock cycle is $\alpha$, with equal probability of being 0 or 1 , what is the power consumption of this circuit? (exclude the power consumption in the clock line) $f_{\text {clk }}=100 \mathrm{MHz}$.
2. [M, None, 7.4] Figure 2 shows a register that attempts to statistically reduce power consumption using a data-transition look-ahead technique.


Figure 0.2 Pulse register.
a. Briefly describe the operation of the circuit.
b. If all the NMOS transistors are of the same size, and all of the PMOS transistors are of the same size, two times wider than the NMOS, roughly determine the input switching probability under which this flip-flop reduces power, compared to an equivalent flip-flop without data-transition look-ahead circuitry.
3. [E, None, 7.6] Shown in Figure 3 is a novel design of a Schmitt trigger. Determine the $W / L$ ratio of transistor $M_{1}$ such that $V_{M+}=3 V_{T n} . V_{D D}=2.5 \mathrm{~V}$. The W/L ratios of other transistors are shown in figure. You may ignore the body effect in this question. The other transistor parameters are as given in Chapter 3.
NMOS: $V_{T n}=0.4 \mathrm{~V}, k_{n}{ }^{\prime}=115 \mu \mathrm{~A} / \mathrm{V}^{2}, V_{D S A T}=0.6 \mathrm{~V}, \lambda=0, \gamma=0 \mathrm{~V}^{1 / 2}$

PMOS: $V_{T_{p}}=-0.4 \mathrm{~V}, k_{p}{ }^{\prime}=-30 \mu \mathrm{~A} / \mathrm{V}^{2}, V_{D S A T}=-1 \mathrm{~V}, \lambda=0, \gamma=-0 \mathrm{~V}^{1 / 2}$


Figure 0.3 Schmitt trigger.
4. [M, None, 7.6] Consider the circuit in Figure 4. The inverter is ideal, with $V_{M}=V_{D D} / 2$ and infinite slope. The transistors have $\mathrm{V}_{\mathrm{T}_{-}}=0.4 \mathrm{~V}, \mathrm{k}_{\mathrm{n}}{ }^{\prime}=120 \mu \mathrm{~A} / \mathrm{V}^{2}$ and $\mathrm{k}_{\mathrm{p}}=40 \mu \mathrm{~A} / \mathrm{V}^{2} . \mathrm{M}_{1}$ has $(\mathrm{W} / \mathrm{L})_{1}=1$. Ignore all other parasitic effects in the transistors.


Figure 0.4 Schmitt trigger.
a. As $V_{I N}$ goes from 0 to $V_{D D}$ and back to 0 explain the sequence of events which makes this circuit operate as a Schmitt Trigger.
b. Find the value of $(\mathrm{W} / \mathrm{L})_{2}$ such that when $V_{I N}$ increases from 0 to $\mathrm{V}_{\mathrm{DD}}$ the output will switch at $V_{\text {in }}=0.8 \mathrm{~V}$.
c. Find the value of $(\mathrm{W} / \mathrm{L})_{3}$ such that when $V_{i n}$ decreases from $\mathrm{V}_{\mathrm{DD}}$ to 0 the output will switch at $V_{\text {in }}=0.4 \mathrm{~V}$. If you don't trust your value from b., you may use $(\mathrm{W} / \mathrm{L})_{2}=5$.
5. [M, None, 7.6] Figure 5 shows an astable multivibrator. Calculate and draw voltage waveforms at the capacitor $\mathrm{V}_{\mathrm{C}}$ and at the output $\mathrm{V}_{\text {out }}$. What is the oscillation frequency of the multivibrator?


Figure 0.5 Astable multivibrator.

Assume that the amplifier is ideal, with symmetric supplies ( $V_{\text {out }}{ }^{\max }=V_{D D}, V_{\text {out }}{ }^{\min }=-V_{S S}$ ) $\mathrm{R}_{1}=1 \mathrm{k} \Omega, \mathrm{R}_{2}=3 \mathrm{k} \Omega, \mathrm{R}_{3}=\mathrm{R}_{4}=4 \mathrm{k} \Omega, \mathrm{C}=1 \mathrm{nF}, V_{D D}=-V_{S S}=5 \mathrm{~V}$, diode voltage $\mathrm{V}_{\mathrm{D}}=0.6 \mathrm{~V}$ (ideal diode), $V_{\text {out }}\left(t=0^{-}\right)=-V_{S S}$.
6. [E, None, 7.6] An oscillator is shown in Figure 6. Draw the signal waveforms for this circuit at nodes X, Y, Z, A, and B. Determine the oscillation frequency. You may assume that the delay of the inverters, the resistances of the MOS transistors, and all internal capacitors can be ignored. The inverter switch point is set at 1.25 V . Assume that nodes Y and Z are initially at 0 V and 2.5 V , respectively.


Figure 0.6 Oscillator.
7. [E, None, 7.6] Consider the oscillator in Figure 7. Assume that the " n " switches turn "on" for voltages above $\mathrm{V}_{\mathrm{DD}} / 2$, and the " p " switches turn "on" for voltages below $\mathrm{V}_{\mathrm{DD}} / 2$. Assume that the current sources stop when the node voltage charges to either $\mathrm{V}_{\mathrm{DD}}$ or ground.


Figure 0.7 Oscillator.
a. Find the oscillation period for $\mathrm{V}_{\mathrm{DD}}=3 \mathrm{~V}$.
b. Draw the waveforms at nodes $\mathrm{X}, \mathrm{Y}$, and Z for two periods.
c. Find the oscillation period.
8. [M, None, 7.6] The circuit in Figure 8 operates at a supply voltage of 3 V and uses two Schmitt triggers with the following threshold voltages: $\mathrm{V}_{\mathrm{M}^{+}}=2 \mathrm{~V}, \mathrm{~V}_{\mathrm{M}-}=1 \mathrm{~V}$.


Figure 0.8 A circuit composed of Schmitt triggers.
a. Identify whether the circuit is monostable, bistable, or astable?
b. Draw the waveforms at nodes X, Y, and A. Mark all important voltage levels.
c. Calculate the key timing parameter for this circuit (propagation delay for bistable, pulse width for monostable, and time period for astable) in terms of R and C. You can assume that gate delays are negligible compared to the delay of the RC network.

## Chapter 10 PROBLEMS

1. [C, None, 9.2] For the circuit in Figure 0.1, assume a unit delay through the Register and Logic blocks (i.e., $t_{R}=t_{L}=1$ ). Assume that the registers, which are positive edge-triggered, have a set-up time $t_{S}$ of 1 . The delay through the multiplexer $t_{M}$ equals $2 t_{R}$.
a. Determine the minimum clock period. Disregard clock skew.
b. Repeat part a, factoring in a nonzero clock skew: $\delta=t_{\theta}^{\prime}-t_{\theta}=1$.
c. Repeat part a, factoring in a non-zero clock skew: $\delta=t_{\theta}^{\prime}-t_{\theta}=4$.
d. Derive the maximum positive clock skew that can be tolerated before the circuit fails.
e. Derive the maximum negative clock skew that can be tolerated before the circuit fails.


Figure 0.1 Sequential circuit.
2. This problem examines sources of skew and jitter.
a. A balanced clock distribution scheme is shown in Figure 0.2. For each source of variation, identify if it contributes to skew or jitter. Circle your answer in Table 0.1


Figure 0.2 Sources of Skew and Jitter in Clock Distribution.

| 1) Uncertainty in the clock generation circuit | Skew | Jitter |
| :--- | :--- | :--- |
| 2) Process variation in devices | Skew | Jitter |
| 3) Interconnect variation | Skew | Jitter |
| 4) Power Supply Noise | Skew | Jitter |
| 5) Data Dependent Load Capacitance | Skew | Jitter |
| 6) Static Temperature Gradient | Skew | Jitter |

Table 0.1 Sources os Skew and Jitter
b. Consider a Gated Clock implementation where the clock to various logical modules can be individually turned off as shown in Figure 0.3. (i.e., Enable ${ }_{1}, \ldots$, Enable $_{N}$ can take on dif-


Fine-grain Clock Gating


Gating Approach A


Gating Approach B

Figure 0.3 Jitter in clock gating
ferent values on a cycle by cycle basis). Which approach ( $A$ or $B$ ) results in lower jitter at the output of the input clock driver? (hint: consider gate capacitance) Explain.
3. Figure 0.4 shows a latch based pipeline with two combinational logic units.


Figure 0.4 Latch Based Pipeline

Recall that the timing diagram of a combinational logic block and a latch can be drawn as follows, where the shaded region represents that the data is not ready yet.


Figure 0.5 Timing diagrams of combinational logic and latch
Assume that the contamination delay $t_{c d}$ of the combinational logic block is zero, and the $t_{c l k-q}$ of the latch is zero too.
a. Assume the following timing for the input $I$. Draw the timing diagram for the signals $a, b$, $c, d$ and $e$. Include the clock in your drawing.


Figure 0.6 Input timing
b. State the deadline for the computation of the signal $b$ and $d$, i.e. when is the latest time they can be computed, relative to the clock edges. In your diagram for (a), label with a "< >" the "slack time" that the signals $b$ and $d$ are ready before the latest time they must be ready.
c. Hence deduce how much the clock period can be reduced for this shortened pipeline. Draw the modified timing diagram for the signals $a, b, c, d$, and $e$. Include the clock in your drawing.
4. Consider the circuit shown in Figure 0.7.


Figure 0.7 Sequential Circuit
a. Use SPICE to measure $t_{\max }$ and $t_{\min }$. Use a minimum-size NAND gate and inverter. Assume no skew and a zero rise/fall time. For the registers, use the following:

- A TSPC Register.
- $\mathrm{A} \mathrm{C}^{2}$ MOS Register.
b. Introduce clock skew, both positive and negative. How much skew can the circuit tolerate and still function correctly?
c. Introduce finite rise and fall time to the clocks. Show what can occur and describe why.

5. Consider the following latch based pipeline circuit shown in Figure 0.8.

Assume that the input, $I N$, is valid (i.e., set up) 2 ns before the falling edge of $C L K$ and is held till the falling edge of $C L K$ (there is no guarantee on the value of $I N$ at other times). Determine the maximum positive and negative skew on $C L K^{\prime}$ for correct functionality.


Figure 0.8 Latch based pipeline
6. For the L1-L2 latch based system from Figure 0.9 , with two overlapping clocks derive all the necessary constraints for proper operation of the logic. The latches have setup times $T_{S U 1}$ and $T_{S U 2}$, data-to-output delays $T_{D-Q 1}$ and $T_{D-Q 2}$, clock-to-output delays $T_{C l k-Q 1}$ and $T_{C l k-Q 2}$, and hold times $T_{H 1}$ and $T_{H 2}$, respectively. Relevant clock parameters are also illustrated in Figure 0.9. The constraints should relate the logic delays, clock period, overlap time $T_{O b}$ pulse widths $P W 1$ and $P W 2$ to latch parameters and skews.


Figure 0.9 Timing constraints
7. For the self-timed circuit shown in Figure 0.10, make the following assumptions. The propagation through the NAND gate can be $5 \mathrm{nsec}, 10 \mathrm{nsec}$, or 20 nsec with equal probability. The logic in the succeeding stages is such that the second stage is always ready for data from the first.
a. Calculate the average propagation delay with $t_{h s}=6 \mathrm{nsec}$.
b. Calculate the average propagation delay with $t_{h s}=12 \mathrm{nsec}$.
c. If the handshaking circuitry is replaced by a synchronous clock, what is the smallest possible clock frequency?


Figure 0.10 Self-timed circuit.
8. Lisa and Marcus Allen have a luxurious symphony hall date. After pulling out of their driveway, they pull up to a four-way stop sign. They pulled up to the sign at the same time as a car on the cross-street. The other car, being on the right, had the right-of-way and proceeded first. On the way they also have to stop at traffic signals. There is so much traffic on the freeway, the metering lights are on. Metering lights regulate the flow of merging traffic by allowing only one lane of traffic to proceed at a time. With all the traffic, they arrive late for the symphony and miss the beginning. The usher does not allow them to enter until after the first movement.

On this trip, Lisa and Marcus proceeded through both synchronizers and arbiters. Please list all and explain your answer.
9. Design a self-timed FIFO. It should be six stages deep and have a two phase handshakin with the outside world. The black-box view of the FIFO is given in Figure 0.11.


Figure $\mathbf{0 . 1 1}$ Overall structure of FIFO.
10. System Design issues in self-timed logic

One of the benefits of using self-timed logic is that it delivers average-case of performance rather than the worst-case performance that must be assumed when designing synchronous circuits. In some applications where the average and worst cases differ significantly you can have significant improvements in terms of performance. Here we consider the case of ripple carry addition. In a synchronous design the ripple carry adder is assumed to have a worst case performance which means a carry-propagation chain of length N for an N -bit adder. However, as we will prove during the course of this problem the average length of the carrypropagation chain assuming uniformly distributed input values is in fact $\mathrm{O}(\log \mathrm{N})$ !
a. Given that $p_{n}(v)=\operatorname{Pr}$ (carry-chain of an n-bit addition is $\geq \mathrm{v}$ bits), what is the probability that the carry chain is of length $k$ for an $n$-bit addition?
b. Given your answer to part (a), what is the average length of the carry chain (i.e., $a_{n}$ )? Simplify your answer as much as possible.

Now $p_{n}(v)$ can be decomposed into two mutually-exclusive events, A and B . Where A represents that a carry chain of length $\geq \mathrm{v}$ occurs in the first $n$ - 1 bits, and B represents that a carry chain of length $v$ ends on the $n$th bit.
c. Derive an expression for $\operatorname{Pr}(\mathrm{A})$.
d. Derive an expression for $\operatorname{Pr}(B)$. (HINT: a carry bit $i$ is propagated only if $a_{i} \neq b_{i}$, and a carry chain begins only if $a_{i}=b_{i}=1$ ).
e. Combine your results from (c) and (d) to derive an expression for $p_{n}(v)-p_{n-1}(v)$ and then bound this result from above to yield an expression in terms of only the length of the carry chain (i.e., $v$ ).
f. Using what you've shown thus far, derive an upper bound for the expression:

$$
\sum_{i=v}\left(p_{i}(v)-p_{i-1}(v)\right)
$$

Use this result, coupled with the fact that $p_{n}(v)$ is a probability (i.e., it's bounded from above by 1 ), to determine a two-part upper bound for $p_{n}(v)$.
g. (The magic step!) Bound $n$ by a clever choice of $k$ such that $2^{k} \leq n \leq 2^{k+1}$ and exploit the fact that $\log _{2} x$ is concave down on $(0, \infty)$ to ultimately derive that $a_{n} \leq \log _{2} n$, which concludes your proof!
h. Theoretically speaking, how much faster would a self-timed 64-bit ripple carry adder be than its synchronous counterpart? (You may assume that the overhead costs of using selftimed logic are negligible).
11. Figure 0.12 shows a simple synchronizer. Assume that the asynchronous input switches at a rate of approximately 10 MHz and that $t_{r}=2 \mathrm{nsec}, f_{\phi}=50 \mathrm{MHz}, V_{I H}-V_{I L}=0.5 \mathrm{~V}$, and $V_{D D}=$ 2.5 V .
a. If all NMOS devices are minimum-size, find $(W / L) p$ required to achieve $V_{M S}=1.25 \mathrm{~V}$. Verify with SPICE.
b. Use SPICE to find $\tau$ for the resulting circuit.
c. What waiting time $T$ is required to achieve a MTF of 10 years?
d. Is it possible to achieve an MTF of 1000 years (where $T>T_{\phi}$ )? If so, how?


Figure $\mathbf{0 . 1 2}$ Simple synchronizer
12. Explain how the phase-frequency comparator shown in Figure 0.13 works.


Figure 0.13 Phase-frequency comparator
13. The heart of any static latch is the cross-coupled structure shown in Figure 0.14 (part a).
a. Assuming identical inverters with $W p / W n=k n^{\prime} / k p^{\prime}$, what is the metastable point of this circuit? Give an expression for the time trajectory of $V_{Q}$, assuming a small initial $V d 0$ centered around the metastable point of the circuit, $V_{M}$.

a) Latch

b) Metastability Detector

c) Synchronizer

Figure 0.14 Simple synchronizer
b. The circuit in part b has been proposed to detect metastability. How does it work? How would you generate a signal M that is high when the latch is metastable?
c. Consider the circuit of part $c$. This circuit was designed in an attempt to defeat metastability in a synchronizer. Explain how the circuit works? What ís the function of the delay element?
14. An adjustable duty-cycle clock generator is shown in Figure 0.15. Assume the delay through the delay element matches the delay of the multiplexer.
a. Describe the operation of this circuit
b. What is the range of duty-cycles that can be achieved with this circuit.
c. Using an inverter and an additional multiplexer, show how to make this circuit cover the full range of duty cycles.


Figure $\mathbf{0 . 1 5}$ Clock duty-cycle generator.
15. The circuit style shown in Figure 0.17.a has been proposed by Acosta et. al. as a new selftimed logic style. This structure is known as a Switched Output Differential Structure ${ }^{1}$.
a. Describe the operation of the SODS gate in terms of its behavior during the pre-charge phase, and how a valid completion signal can be generated from its outputs.
b. What are the advantages of using this logic style in comparison to the DCVSL logic style given in the notes?
c. What are the disadvantages of using this style in comparison to DCVSL?
d. Figure 0.16 .b shows a 2 -input AND gate implemented using a SODS style. Simulate the given circuit using Hspice. Do you notice any problems? Explain the cause of any problems that you may observe and propose a fix. Re-simulate your corrected circuit and verify that you have in fact fixed the problem(s).

[^0]
16. Voltage Control Ring Oscillator.

In this problem, we will explore a voltage controlled-oscillator that is based upon John G. Maneatis' paper in Nov. 1996, entitled "Low Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques," appeared in the Journal of Solid-State Circuits. We will focus on a critical component of the PLL design: the voltage-controlled ring oscillator. Figure 0.17 shows the block diagram of a voltage controlled ring oscillator:


Figure 0.17 Voltage Controlled Ring Oscillator
The control voltage, Vctl, is sent to a bias generator that generates two voltages used to properly bias each delay cell equally, so that equal delay (assuming no process variations) appear across each delay cell. The delay cells are simple, "low-gain" fully differential input and output operational amplifiers that are connected in such a way that oscillations will occur at any one of the outputs with a frequency of $1 /\left(4^{*}\right.$ delay $)$. Each delay is modeled as an RC time constant; C comes from parasitic capacitances at the output nodes of the delay element,
and R comes from the variable resistor that is the load for the delay cell. Below is a circuit schematic of a typical delay cell.


Figure 0.18 One delay Cell
As mentioned before, the value of $R$ is set by a variable resistor. How can one make a variable resistor? The object in the delay cell that is surrounded by a dotted line is called a "symmetric load," and provides the answer to a voltage-controlled variable resistor. R should be linear so that the differential structure cancels power supply noise. We will begin our analysis with the symmetric load.
a. In Hspice, input the circuit below and plot Vres on the X axis and Ires on the Y axis, for the following values of Vctlp: $0.5,0.75,1.0,1.25,1.5,1.75$, and 2.0 volts, by varying Vtest from Vctlp to Vdd, all on the same graph. For each curve, plot Vres from 0 volts to Vdd-Vctlp. When specifying the Hspice file, be sure to estimate area and perimeter of drains/sources.


Figure $\mathbf{0 . 1 9}$ :Symmetric Load Test Circuit
After you have plotted the data and printed it out, use a straight edge to connect the end points for each curve. What do you notice about intersection points between the line you drew over each curve, and the curves themselves? Describe any symmetries you see.
b. For each Vctlp curve that you obtained in a), extract the points of symmetries (Vres, Ires), and find the slope of the line around these points of symmetry. These are the effective resistances of the resistors. Also, for each Vctl curve, state the maximum amplitude the output swing can be, without running into asymmetries. Put all of this data in an worksheet format.
c. Using the estimations you made for area and perimeter of drain ad source that you put in your Hspice file, calculate the effective capacitance. (Just multiply area and perimeter by CJ and CJSW from the spice deck). Since we are placing these delay elements in a cas-
caded fashion, remember to INCLUDE THE GATE CAPACITANCE of the following stage. Each delay element is identical to one another. Now, calculate the delay in each cell, according to each setting of Vctlp that you found in a): delay $=0.69 * \mathrm{R} * \mathrm{C}$. Then, write a general equation, in terms of R and C , for the frequency value that will appear at each delay output. Why is it necessary to cross the feedback lines for the ring oscillator in the first figure? Finally, draw a timing/transient analysis of each output node of the delay lines. How many phases of the base frequency are there?
d. Now, we will look at the bias generator. The circuit for the bias generator is as follows:


Figure $\mathbf{0 . 2 0}$ :Bias Generator
Implement this circuit in Hspice, and use the ideal voltage controlled voltage source for your amplifier. Use a value of 20 for A. This circuit automatically sets the Vctln and Vctlp voltages to the buffer delays to set the DC operating points of the delay cells such that the symmetric load is swinging reflected around its point of symmetry for a given Vctl voltage. Also, it is important to note that Vctl is the same as Vctlp. It must go through this business to obtain Vctln (which sets the bias current to the correct value, which sets the DC operating point of the buffer). Do a transient run in Hspice to verify that Vctlp is indeed very close to Vctl over a range of inputs for Vctl. Show a Spice transient simulation that goes for 1uS, and switches Vctl in a pwl waveform across a range of inputs between 0.5 V and 2.0 V . For extra points, explain how this circuit works.
e. Now, hook up the bias generator you just built with 4 delay cells, as shown in the first figure. For each control voltageVctlp from part c), verify your hand calculations with spice simulations. Show a spreadsheet of obtained frequencies vs. hand-calculation predictions, and in a separate column, calculate \% error. Give a brief analysis of what you see. Print out all of the phases (4) of the clock, for a Vctl value of your choice.

## Chapter 11 <br> PROBLEMS

1. [E, None, 11.6] For this problem you are given a cell library consisting of full adders and twoinput Boolean logic gates (i.e. AND, OR, INVERT, etc.).
a. Design an N-bit two's complement subtracter using a minimal number of Boolean logic gates. The result of this process should be a diagram in the spirit of Figure 11.5 . Specify the value of any required additional signals (e.g., $C_{i n}$ ).
b. Express the delay of your design as a function of $N, t_{\text {carry }}, t_{\text {sum }}$, and the Boolean gate delays ( $t_{\text {and }}, t_{\text {or }}, t_{\text {inv }}$, etc.).
2. [M, None, 11.6] A magnitude comparator for unsigned numbers can be constructed using full adders and Boolean logic gates as building blocks. For this problem you are given a cell library consisting of full adders and arbitrary fan-in logic gates (i.e., AND, OR, INVERTER, etc.).
a. Design an $N$-bit magnitude comparator with outputs $A \geq B$ and $A=B$ using a minimal number of Boolean logic gates. The result of this process should be a diagram in the spirit of Figure 11.5. Specify the value of any required control signals (e.g., $C_{i n}$ ).
b. Express the delay of your design in computing the two outputs as a function of $N, t_{\text {carry }}$, $t_{\text {sum }}$, and the Boolean gate delays ( $t_{\text {and }}, t_{o r}, t_{\text {inv }}$, etc.).3.
3. [E, None, 11.6] Show how the arithmetic module in Figure 0.1 can be used as a comparator. Derive an expression for its propagation delay as a function of the number of bits.


Figure 0.1 Arithmetic module.
4. [E, None, 11.6] The circuit of Figure 11.2 implements a 1-bit datapath function in dynamic (precharge/evaluate) logic.
a. Write down the Boolean expressions for outputs $F$ and $G$. On which clock phases are outputs $F$ and $G$ valid?
b. To what datapath function could this unit be most directly applied (e.g., addition, subtraction, comparison, shifting)?
5. [M, None, 11.3] Consider the dynamic logic circuit of Figure 0.2 .
a. What is the purpose of transistor $M_{1}$ ? Is there another way to achieve the same effect, but with reducing capacitive loading on the clock $\Phi$ ?

2

b. How can the evaluation phase of $F$ be sped up by rearranging transistors? No transistors should be added, deleted, or resized.
c. Can the evaluation of $G$ be sped up in the same manner? Why or why not?
6. [M, SPICE, 11.3] The adder circuit of Figure 0.3 makes extensive use of the transmission gate XOR. $V_{D D}=2.5 \mathrm{~V}$.
a. Explain how this gate operates. Derive the logic expression for the various circuit nodes. Why is this a good adder circuit?
b. Derive a first-order approximation of the capacitance on the $C_{o}$-node in equivalent gatecapacitances. Assume that gate and diffusion capacitances are approximately identical. Compare your result with the circuit of Figure 11-6.
c. Assume that all transistors with the exception of those on the carry path are minimumsize. Use $4 / 0.25$ NMOS and 8/0.25 PMOS devices on the carry-path. Using SPICE simulation, derive a value for all important delays (input-to-carry, carry-to-carry, carry-tosum).



Sum generation
Figure 0.3 Quasi-clocked adder circuit


Carry generation

Digital Integrated Circuits - 2nd Ed
7. [M, None, 11.3] The dynamic implementation of the 4-bit carry-lookahead circuitry from Fig. 11-21 can significantly reduce the required transistor count.
a. Design a domino-logic implementation of Eq. 11.17. Compare the transistor counts of the two implementations.
b. What is the worst-case propagation delay path through this new circuit?
c. Are there any charge-sharing problems associated with your design? If so, modify your design to alleviate these effects.
8. [C, None, 11.3] Figure 0.4 shows a popular adder structure called the conditional-sum adder. Figure 0.4.a shows a four-bit instance of the adder, while 0.4.b gives the schematics of the basic adder cell. Notice that only pass-transistors are used in this implementation.
a. Derive Boolean descriptions for the four outputs of the one-bit conditional adder cell.
b. Based on the results of describe how the schematic of 0.4.a results in an addition.
c. Derive an expression for the propagation delay of the adder as a function of the number of bits $N$. You may assume that a switch has a constant resistance $R_{o n}$ when active and that each switch is identical in size.

(a) Four-bit conditional-sum adder




(b) Conditional adder cell

Figure 0.4 Conditional-sum adder.
9. [M, None, 11.3] Consider replacing all of the NMOS evaluate transistors in a dynamic Manchester carry chain with a single common pull-down as shown in Fgure 0.5.a. Assume that each NMOS transistor has $(W / L)_{N}=0.5 / 0.25$ and each PMOS has $(W / L)_{P}=0.75 / 0.25$. Further assume that parasitic capacitances can be modeled by a 10 fF capacitor on each of the

4
internal nodes: $A, B, C, D, E$, and $F$. Assume all transistors can be modeled as linear resistors with an on-resistance, $R_{o n}=5 \mathrm{k} \Omega$.
a. Does this variation perform the same function as the original Manchester carry chain? Explain why or why not.
b. Assuming that all inputs are allowed only a single zero-to-one transition during evaluation, will this design involve charge-sharing difficulties? Justify your answer.
c. Complete the waveforms in Figure 0.5 b for $P_{0}=P_{1}=P_{2}=P_{3}=2.5 \mathrm{~V}$ and $G_{0}=G_{1}=G_{2}=$ $G_{3}=0 \mathrm{~V}$. Compute and indicate $t_{p H L}$ values for nodes $A, E$, and $F$. Compute and indicate when the $90 \%$ precharge levels are obtained.

(a) Circuit schematic

(b) Partial waveforms

Figure 0.5 Alternative dynamic Manchester carry-chain adder.
10. [M, None, 11.3] Consider the two implementations of Manchester carry gates in Figure 11-8.
a. Compare the delay per segment of the two implementations
b. Compare the layout complexities of the two gates using stick diagrams.
c. In the precharged Manchester carry chain using the gate from b. find the probability that the carry signal is propagated from the $15^{\text {th }}$ to the $16^{\text {th }}$ bit of a 32 -bit adder, assuming random inputs.
11. [C, None, 11.3] Consider the Radix-4 and Radix-2 Kogge-Stone adders from Figures 11-22 and 11-27 extended to 64 -bits. All gates are implemented in domino and all gates in a stage have the same size. The adders have an overall fanout (electrical effort) of 6 .
a. Using logical effort, identify the critical path.
b. Size the gates for minimum delay (hint: don't forget to factor in branching). Which adder is faster?
c. Let's now consider sparse versions of each of the above trees. In a tree with a sparseness of 2 , only every other carry is computed and it is used to select 2 sums. Similarly, a tree with a sparseness of 4 computes every fourth carry - and that carry signal is used to select 4 sums. Repeat a. and b. for Radix-2 and Radix-4 trees with sparseness of 2 and 4 and compare their speed. Which adder is fastest?
d. Compare the switching power of all adders analyzed in this problem.
12. [C, None, 11.3] In this problem we will analyze a carry-lookahead adder proposed by H. Ling more than 20 years ago, but still among the fastest adders available. In a conventional adder, in order to add two numbers

$$
\begin{aligned}
& A=a_{n-1} 1^{n-1}+a_{n-2} 2^{n-2}+\ldots .+a_{0} 2^{0} \\
& B=b_{n-1} 2^{n-1}+b_{n-2} 2^{n-2}+\ldots .+b_{0} 2^{0}
\end{aligned}
$$

we first compute the local carry generate and propagate terms:

Digital Integrated Circuits - 2nd Ed

$$
\begin{aligned}
& g_{i}=a_{i} b_{i} \\
& p_{i}=a_{i}+b_{i}
\end{aligned}
$$

then, with a ripple or a tree circuit we form the global carry-out terms resulting from the recurrence relation:

$$
G_{i}=g_{i}+p_{i} G_{i-1}
$$

Finally, we form the sum of $A$ and $B$ using local expressions:

$$
S_{i}=p_{i} \oplus G_{i-1}
$$

In the conventional adder, the terms $G_{i}$ have, as described, a physical significance. However, an arbitrary function could be propagated, as long as sum terms could be derived. Ling's approach is to replace $G_{i}$ with:

$$
H_{i}=G_{i}+G_{i-1}
$$

i.e. $H_{i}$ is true if "something happens at bit $i$ " - there is a carry out or a carry in. $H_{i}$ is so-called "Ling's pseudo-carry".
a. Show that:

$$
H_{i}=g_{i}+t_{i-1} H_{i-1}
$$

where $p_{i}=a_{i}+b_{i}$ (it was Ling's idea to change the notation).
b. Find a formula for computing the sum out of the operands and Ling's pseudo-carry.
c. Unroll the recursions for $G_{i}$ and $H_{i}$ for $i=3$. You should get the expressions fpr $G_{3}$ and $H_{3}$ as a function of the bits of input operands. Simplify the expressions as much as possible.
d. Implement the two functions using n-type dynamic gates. Draw the two gates and size the transistors. Which one helps us build a faster adder? Explain your answer.
13. [M, None, 11.4] An array multiplier consists of rows of adders, each producing partial sums that are subsequently fed to the next adder row. In this problem, we consider the effects of pipelining such a multiplier by inserting registers between the adder rows.
a. Redraw Figure 11-31 by inserting word-level pipeline registers as required to achieve maximal benefit to throughput for the $4 \times 4$ multiplier. Hint: you must use additional registers to keep the input bits synchronized to the appropriate partial sums.
b. Repeat for a carry-save, as opposed to ripple-carry, architecture.
c. For each of the two multiplier architectures, compare the critical path, throughput, and latency of the pipelined and nonpipelined versions.
d. Which architecture is better suited to pipelining, and how does the choice of a vectormerging adder affect this decision?
14. [M, None, 11.4] Estimate the delay of a $16 \times 16$ Wallace tree multiplier with the final adder implemented using a Radix-4 tree. One FA has a delay of $t_{p}$, a HA $2 / 3^{*} t_{p}$ and a CLA stage $1 / 2 * t_{p}$.
15. [E, None, 11.5] The layout of shifters is dominated by the number of wires running through a cell. For both the barrel shifter and the logarithmic shifter, estimate the width of a shifter cell as a function of the maximum shift-width $M$ and the metal pitch $p$.
16. [E, None, 11.7] Consider the circuit from Figure 0.7 . Modules A and B have a delay of 10 ns and 32 ns at 2.5 V , and switch 15 pF and 56 pF respectively. The register has a delay of 2 ns and switches 0.1 pF . Adding a pipeline register allows for reduction of the supply voltage while maintaining throughput. How much power can be saved this way? Delay with respect to $V_{D D}$ can be approximated from Figure 11-57.
17. [E, None, 11.7] Repeat Problem 16, using parallelism instead of pipelining. Assume that a 2-to- 1 multiplexer has a delay of 4 ns at 2.5 V and switches 0.3 pF . Try parallelism levels of 2 and by 4 . Which one is preferred?

6

Figure 0.6 Pipelined datapath.

## DESIGN PROBLEM

Using the $0.25 \mu \mathrm{~m}$ CMOS technology, design a static 32-bit adder, with the following constraints:

1. input capacitance on each bit is limited to not more than 50 fF .
2. each bit is loaded with 100 fF .

Use a carry lookahead tree of your choice for implementation. The goal is to achieve the shortest propagation delay.

Determine the logic design of the adder and $W$ and $L$ of all transistors. Initially size the design using the method of logical effort. Estimate the capacitance of carry signal wires based on the floorplan. Verify and optimize the design using SPICE. Compute also the energy consumed per transition. If you have a layout editor available, perform the physical design, extract the real circuit parameters, and compare the simulated results with the ones obtained earlier. For implementation use the $144 \lambda$.bit-slice pitch, that corresponds to 36 metal-1 tracks. Use metal 1 for cell-level power distrbution and intra-cell routing, metal -2 for short interconnect and metal- 3 and metal- 4 for long carries.


[^0]:    ${ }^{1}$ A.J. Acosta, M. Valencia, M.J. Bellido, J.L. Huertas, "SODS: A New CMOS Differential-type Structure," IEEE Journal of Solid State Circuits, vol. 30, no. 7, July 1995, pp. 835-838

