Charge Recycling Clocking for Adiatbatic Style Logic

Luns Tee and Lizhen Zheng

Abstract - A scheme for stepwise charging of multi-phase clocks without the need for additional voltage supplies or tank capacitances is presented. This scheme is implemented to drive an Adiabatic Dynamic Logic (ADL) shift register and energy consumption is compared against a conventional C2MOS shift register, across different supply voltages and frequencies. At equal supply voltages, the ADL register consumed less energy than C2MOS, but with voltage scaling, best case energy consumptions for both were comparable.

Introduction

Conventional logic is subject to CV2/2 of energy dissipation every time a capacitance is charged from a supply or discharged to ground. Various design styles working around this, dubbed as Adiabatic Logic for their ideally zero heat dissipation, have already been presented. Dissipation is eliminated by charging from a slow moving clock that ramps through the same voltage swing as the load being charged. An overview of Adiabatic styles has been provided in an earlier report for this project [1].

Most literature to date on Adiabatic Logic pursues the holy grail of zero energy dissipation. However, approaching this requires slow operation - clock speeds above 1MHz are rarely quoted together with energy savings claims - and additional complexity - a 3 bit adder has been reported as requiring 20 times the devices and 32 times the area of a conventional adder [2]. These factors have prevented more widespread adoption of adiabatic logic styles. However, the concepts behind Adiabatic styles are still valid and by forfeiting the asymptotically zero power dissipation, it may be possible to bring performance and complexity closer to conventional logic while still maintaining some of the energy advantage. This work explores that hope.

Stepwise Charging

Adiabatic logic styles essentially hand the energy dissipation problem over from the logic to the clock generators. Two general schemes exist for reduced dissipation driving of capacitive loads - inductive resonance, and stepwise charging. Inductive resonance schemes rely on exchanging energy between the capacitive load and an inductor. The operating frequency is dependent on the LC product and is not readily tunable once an inductance is chosen. This makes synchronization with other clocks problematic. Moreover, on chip inductors are rather lossy while using an off-chip inductor incurs losses as energy is passed back and forth on and off chip.

For these reasons, we chose to work with a variant of the stepwise charging technique. Conventional stepwise charging as presented by Svensson and Koller [3] drives a load by consecutively connecting it to different voltage supplies, stepping from rail to rail through N steps. CV2 is still dissipated, however the voltages involved per step are reduced by a factor of N. Total energy dissipated is reduced by a factor of N.

One problem with stepwise charging is the need for the additional voltage supplies. Since the net charge drawn from the intermediate supplies is zero - charge drawn during one swing is replaced during the returning swing - tank capacitances can be substituted for the supplies. For random charging such as that seen with line drivers, these tank capacitances need to be large relative to the loads being charged. For large loads, tank sizes can become prohibitive.

Closer examination reveals a way around tank capacitances. Since charge is both drawn from and sunk to each supply, when a load is drawing charge from a supply, if another load can be scheduled to provide the same charge, the supply can become superfluous. The switches from each of the loads to the supply can be replaced with a single switch between loads, offering half the resistance between loads for faster charging.

The simplest case of this would be to use differential signalling with stepwise charging - at a step where both sides of a signal would be charged to half the supply voltage, for a pair of identical loads, shunting the loads together gives the same end result, bypassing the need for a tank.

Steps towards this midpoint charging, and subsequent steps away cannot exploit differential symmetry if all swings occur at the same time. However, if other load swings can be staggered with the original signal pair, the same technique can be applied to the other voltages where the new signals cross the original pair.

Figure 1

The problem of clock driving lends itself nicely to this technique as signals are regular and it is easy to see where charges go. Fig. 1 shows a four phase clock signal. Such a clock signal can be used for an Adiabatic Logic style as will be discussed in the following section. Assuming equal capacitances at each phase, any time the voltage for one phase passes that of another phase, the charging towards that crossing can be done with a single transistor shunting the two phases together.

Figure 2

Fig. 2 shows a driver implementing this technique. Devices around the periphery charge to ground and Vdd. Transistors between opposing phases bring them to Vdd/2. Transistors between adjacent phases are for transitions to Vdd/4 and 3Vdd/4. Although these device pairs appear like transmission gates, the PMOS devices have minimal Vgs-Vt at 3Vdd/4 compared to the NMOS, and vice-versa at Vdd/4, so only one of each transistor pair is turned on during any phase.

The relative size of each group of transistors is also straightforward to determine. Starting with the NMOS devices for Vdd/4 as a baseline, PMOS devices for 3Vdd/4 charge at the same time and need to be made larger to account for their reduced carrier mobility. The NMOS devices at Vdd/2 similarly need to be larger as they have less Vgs-Vt. Perhaps contrary to intuition, the PMOS devices to Vdd should also be somewhat larger still since despite the larger average Vgs, they have a smaller initial Vds.

This driver still requires controlling logic to drive the transistor gates. Some margin is necessary between phases of the control signals to avoid overlap which would cause leakage of the charge recycling. We assume these signals to be available as on-chip PLL structures becoming commonplace, and the PLL power consumption is assumed negligible relative to that of the clock load.

Lastly, the assumption that all capacitances are equal needs to be addressed. If the capacitances for each phase is known, dummy capacitances can be added to balance out loads. However, the capacitance presented by logic may be data dependent. For large amounts of logic, the law of large numbers suggests the variation would not be severe, but differential logic styles are available that present a constant capacitance. For this work, we assume that data statistics would keep data dependencies sufficiently balanced.

Adiabatic Dynamic Logic

Of the different Adiabatic Logic styles, Adiabatic Dynamic Logic (ADL) as presented by Dickinson and Denker [4] was chosen for this work for its simplicity and relative ease of logic synthesis. A basic ADL cascade is shown in Fig. 3, and consists of alternating N and P inverter blocks. Each gate restores its output to the precharge voltage through the diode on one clock transition, and the transistor, depending on its input, may charge the output away from the precharge state at the other clock transition.

Figure 3

Although the style is normally presented as running on trapezoidal clocks, this actually is not strictly necessary, and triangular clocks can be substituted. This can be seen through several observations. First, as the logic has no references to ground or Vdd, the clocks can actually swing between some other pair of voltages and still operate correctly. Secondly, the clock plateaus for the hold phase are not strictly necessary: the clocks can continue with their evaluate transition and come back down during the hold time without ill effect: if the driving transistor is off, the output still stays at its precharge voltage, while if the transistor is on, the extra voltage swing of the output only serves to turn on the next stage's input devices stronger. Lastly, the precharge hold periods can be extended similarly as they exist so that a gate's evaluate phase can start after the end of the previous stage's evaluate phase where the output has become unambiguous. Beginning partway through the prior stage's evaluation is still allowable as long as the evaluation has already proceeded far enough that the output is already unambiguous and will not later change from the precharge voltage to an evaluate voltage.

This logic style relies on a diode for restoring the output to its precharge state, and this diode is the main source of energy dissipation. Charge passes through the diode with its turn-on voltage, with total dissipation being CVdiodeVdd. The turn-on voltage is important also in defining the final precharge voltage - if the drop is too large, the precharge condition will be insufficient to turn off the next stage transistor. For this reason, we use low-threshold diode-connected transistors for the diode, using PMOS based diodes for the N blocks and NMOS based diodes for the P blocks to avoid body effect problems.

Simulations

A set of shift registers, each 16 inverters deep, totalling one million ADL inverters, was simulated together with the stepwise charging clock driver presented. Size ratios for the clock driver are those given in Fig. 2, and simple geometric tapered inverter chains drive the transistor gates. Each ADL inverter consists of a minimum size NMOS device and a PMOS device of double the gate width. Supply voltages of 3V and 2V were used, with the sizing of clock driver and inverter chain being adjusted by hand for minimum energy allowing proper operation at each of several frequencies at each voltage. Device models were from a Hewlett Packard 0.6um process.

Figure 4

Figure 4.

Figure 4 shows sample waveforms for the chain running correctly at 33Mhz from a 3V supply. The zero in the midst of ones fed to the input is seen to propagate through subsequent stages with an inversion each time. The clocks also behave essentially as expected, though some anomalous behavior is seen around the Vdd/2 crossings. This is the result of capacitive couplings both from the driver input, and from adjacent clock phases through the ADL gate.

Higher supply voltages were also tested, but proved to be unworkable for the simple shift register. At 5V, capacitive feedthrough from clock input to gate output would affect a floating precharged output enough to falsely trigger the next downstream gate. More capacitance at the output node, as might be seen with more complicated logic with larger fanout, would reduce this effect, perhaps enough to be usable, but to keep results consistent for comparison, this fact was not used for this study.

A lower limit on supply voltage also exists - the supply voltage cannot go below twice the transistor threshold voltage as the Vdd/2 shunt devices would no longer be able to turn on.

Figure 5

Figure 5 shows two elements out of C2MOS shift registers that were also simulated. Again, a total of one million minimum size inverting elements with clocks driven by geometrically tapered inverter chains were simulated. Supply voltages of 3V, 1.5V and 1.1V were used, and clock driving chains were sized for several frequencies at each voltage.

Figure 6

Figure 6.

Figure 6 shows total energy consumption per inverter for the ADL and C2MOS shift registers, inclusive of driver energy with an activity factor of 1 in both cases. For each supply voltage, energy consumption at low frequencies is that of the logic alone, while as frequencies increase, the scaling factor for the inverter chains has to decrease. When scaling factors become small enough, buffer energy quickly becomes significant.

For the same supply voltage of 3V, at low frequencies the ADL chain is seen to operate with less energy than the C2MOS chain. The disparity is of about a factor of five, and comes from several factors: the simplicity of the ADL gate (two as opposed to four transistors per inverter), charge recycling (ideally a factor of four from there being four phases), and incomplete charging for ADL (output never swings completely from rail to rail but stops at least a diode drop short) as well as incomplete clock swing with ADL from running too fast for the drivers to reach the supply rails. Non asymptotic operation of the ADL gates offsets this - the assumption that the precharge transistors drop only their threshold voltage does not hold.

At the same supply voltages however, C2MOS can operate to a higher clock frequency since it does not have the overhead of the recycling driver's multiple control phases: pulse widths are more than four times longer, thus longer rise/fall times are permissible. The extra performance margin can be exchanged for lower supply voltages. At the lower voltages required to match ADL's low-frequency energy, the C2MOS is seen to keep this lower energy to a somewhat higher frequency than ADL does.

Conclusions

For a given supply voltage, if speed requirements are lax enough to permit the use of charge recycling stepwise clocks with ADL, energy can be saved over conventional logic styles. However, with such a margin for performance, comparable if not better energy savings can be achieved via voltage scaling of a conventional design, saving the need for a change of design methodology.

References

[1] L. Tee, L. Zheng. Charge Recovery and Adiabatic Switching Techniques in Digital Logic. EE241 Midterm report. Mar. 1997

[2] W. C. Athas, J. Svensson, J. G. Koller, N. Tzartzanis and E. Y. Chou. Low-power digital systems based on adiabatic-switching principles. In IEEE Trans. on VLSI Systems, Vol.2,No.4,pp398--406, Dec 1994.

[3] L. Svensson and J. G. Koller Driving a capacitive load without dissipating fCV2 . In 1994 IEEE Symposium on Low Power Electronics, Digest of Technical Papers, p100-101, 1994.

[4] A. G. Dickinson, J. S. Denker. Adiabatic Dynamic Logic. In Proc. IEEE 1994 CICC, 282--285, 1994.