Consider a basic four input OR gate as shown in Figure 10. It consists of a four input NOR stage followed by an inverter. This is a more complex type of circuit than I have shown so far because it consists of two stages of complementary logic connected in series. The input stage has large, 8 um wide PFETs compared to only 1 um wide NFETs because in the NOR topology the PFETs are connected in series, diminishing their effective strength by about 75%.
Figure 10 Four Input Static OR Gate
While the NOR stage might have roughly equivalent drive strength as an inverter with a 2 um wide PFET and 1 um wide NFET it has much higher parasitic capacitance from its constituent transistors. The four input NOR stage has 36 total um (8+8+8+8+1+1+1+1) of transistor width compared to only 3 total um (2+1) of transistor width for an equivalent drive strength inverter stage. This means the NOR stage is much slower in operation than an inverter stage. Although NAND stages are also slower than an inverter stage the disparity is not as great and transistor size compensation is less unwieldy.
For maximum circuit performance it is desirable to avoid paths with many transistors connected in series, especially PFETs. Another way to implement a four input OR gate is as 2 two input NOR gates cascaded into a two input NAND gate. This second configuration is shown in Figure 11. Although using a two input NAND gate as the output stage is slightly slower than the inverter output stage used in the first circuit shown in Figure 10, the input stage only has two series connected PFETs instead of four which greatly speeds up its switching characteristics.
Figure 11. Alternative Four Input Static OR Gate
The two different circuit configurations to implement a four input OR function shown in Figures 10 and 11 are both static CMOS circuits. That is to say, the circuits are combinational and asynchronous. Any time any or all of the four input signals switches state the circuit statically evaluates the OR logic function and outputs the new result. Conversely, if none of the inputs change state then the gate output (and all internal nodes) remain quiescent.
In microprocessors, logic circuits often operate on signal inputs that only switch states at known times relative to a periodic signal called a clock. By restricting the times that input signals can change state relative to a clock signal it is possible to design logic circuits that operate faster than the static CMOS designs shown so far. Consider a dynamic implementation of the four input OR gate function shown in Figure 12. It is a form of dynamic circuit often called domino logic. Although in a purist’s sense this circuit isn’t really dynamic but rather pseudo static (a true dynamic circuit wouldn’t have the keeper PFET P2 and would have a minimum operating frequency) I will follow the common and widely used convention of still referring to it as dynamic logic.
Figure 12. Dynamic Four Input Domino OR Gate
The dynamic circuit operates in two distinct phases. In the precharge phase the clock input named EVAL is low. EVAL turns off NFET N1 and turns on PFET P1. In turn, P1 precharges internal node X high, which forces output OUT low. During the precharge phase the inputs A, B, C, and D may change state. The evaluate stage occurs when the EVAL signal goes high. This turns off P1 and turns on N1. If any of the four inputs are high a path from internal node X to VSS through N1 is created and X is pulled low and the output goes high. If all four inputs are low node X remains in the high state through the influence of PFET P2. P2 is sized to be much weaker than the NFETs in the circuit so as to be easily overridden when X is pulled low. Once X is pulled low then the output changes to a high value and P2 is turned off. The name domino comes from the analogy between a sequence of domino logic gates connected in series and a line of domino pieces stood on edge in a row. When the evaluate phase starts a high going output from the first stage can quickly propagate down the chain of dynamic logic discharging internal evaluation nodes like one domino piece knocking down the next piece in the row and so on.
Notice that during the evaluate phase, if node X is pulled low that it will remain low until the next precharge phase. Therefore it is important for all inputs to a dynamic gate to remain stable in their intended state while the evaluate signal EVAL is high. That is in contrast to a static logic gate that imposes no restriction on when inputs may change state. Also notice that the output of a dynamic gate will always go low during the precharge phase regardless of the state of the inputs.
That means a dynamic gate may undergo switching activity each clock cycle even if the inputs remain unchanged.
For all its complexity and constraints, the dynamic logic gate in Figure 12 has a number of advantages over its static cousins in Figure 10 and 11. First of all, the complementary network of PFETs in a static CMOS logic gate has been replaced by a precharge PFET (P1) and keeper PFET (P2). The logic function of the gate is entirely determined by the topology of the NFET pull down network in the first stage of the domino circuit. This has a profound impact on performance and area since the complementary PFET network typically represents over half the circuit area and roughly 2/3 of the non-interconnect related internal parasitic capacitance. In general, the higher the fan-in (number of inputs) of a logic function the greater the potential performance advantage of a dynamic circuit implementation.
Another advantage of dynamic logic is that fast logic propagation occurs in only one direction – internal evaluation node falls while the stage output rises. The opposite transition occurs during the precharge phase in parallel across all logic stages and isn’t as time critical. This means the transistor size ratios in dynamic circuits can be skewed to increase propagation speed for the active transition. For example, in a domino circuit the WP/WN ratio in the output inverter may be as high as 6:1 in practice . In addition, the input voltage level that a dynamic logic stage switches is typically lower than for a static logic gate. This is an advantage in reducing signal propagation times but is also a disadvantage in that circuits are much more vulnerable to induced noise on logic signals. Typical design practice is for signals that travel a long distance (and thus are more susceptible to noise and ground shift) must be cleaned up by passing through a latch or static logic gate before connecting to the input of a dynamic gate.
It is possible to speed up the circuit in Figure 12 even further by eliminating the NFET "foot" device N1 and tying the logic network directly to ground. This is shown in Figure 13.
Figure 13. Unfooted Domino Four input OR Gate
To do this the inputs have to be even more tightly constrained, namely they must all be low during the precharge phase. Otherwise there would be current contention between P1 and the pulldown network that would consume excessive power and interfere with the precharge of node X. Notice that outputs from domino logic gates naturally have the property of being low during precharge. It is not uncommon to see a logic block consisting of a footed domino stage followed by a sequence of unfooted domino stages.
Another form of CMOS dynamic logic is called zipper or NORA logic. It is very fast because it dispenses with the inversion stage in domino logic and instead alternates NFET and PFET based logic networks and associated evaluation nodes that are precharged high and low respectively. Although the use of PFET logic networks isn’t ideal for high-speed operation the reduced number of circuit stages can often more than make up for it. The basic form of zipper logic is shown in Figure 14.
Figure 14. Generalized Form for Zipper Dynamic Logic
During the evaluate period EVAL is low and EVAL- is high. The NFET stage evaluation nodes X1 and X2 are precharged high while PFET stage evaluation nodes Y1 and Y2 are precharged low. In the evaluation phase a propagating signal path would cause X1 to go low, Y1 to go high, X2 to go low, and Y2 to go high. Although zipper logic has rarely been used in practice due to difficulties in combining its complex timing and signal quality requirements with other design methodologies there are some indications that the constant quest for performance has renewed interest in it among MPU designers.
To demonstrate how various static and dynamic logic implementations compare in speed a demonstration circuit was implemented using a variety of configurations. The circuit chosen is a 64 input one detect circuit. Effectively a 64 input OR gate, the function may be used for purposes like zero value detection and comparator functions in a 64 bit microprocessor. Six different implementations, one static and five dynamic, were examined. The critical path propagation time in a 0.13 um bulk CMOS process was simulated with nominal processing, temperature, and voltage. The results are listed in Table 1.
Critical Path Timing
3 Stage, cascaded NOR/NAND
3 Stage, cascaded OR4
3 Stage, cascaded OR4
2 Stage, cascaded OR8
2 Stage, cascaded OR8
3 Stage, cascaded OR4
The timing figures in Table 1 clearly show that dynamic logic techniques can greatly speed up certain types of high fan-in logic functions that are often on critical paths in most microprocessor implementation. The speedup varies with the logic function and also with the process type. For example, the speedup from dynamic design techniques is less dramatic in a silicon on insulator (SOI) CMOS process. This is due to the decreased transistor parasitic capacitance creating less of a saving from eliminating the PFET pull up network in complementary static logic. Another factor is that SOI has certain characteristics like pass gate effect and parasitic bipolar current that can require that the keeper PFET in dynamic circuits be increased in strength (which slows down active edge transitions) or even that dynamic circuits be replaced with static circuits in some situations .
Discuss (67 comments)