On gate level power optimization using dual-supply voltages - Very

We call the power gain of gate . For the purpose of further discussion in the following, we give some definitions first. Definition 2.1: A circuit (or graph ) is safe if.
428KB taille 1 téléchargements 275 vues
616

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

On Gate Level Power Optimization Using Dual-Supply Voltages Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE

Abstract—In this paper, we present an approach for applying two supply voltages to optimize power in CMOS digital circuits under the timing constraints. Given a technology-mapped network, we first analyze the power/delay model and the timing slack distribution in the network. Then a new strategy is developed for timing-constrained optimization issues by making full use of slacks. Based on this strategy, the power reduction is translated into the polynomial-time-solvable maximal-weighted-independent-set problem on transitive graphs. Since different supply voltages used in the circuit lead to totally different power consumption, we propose a fast heuristic approach to predict the optimum dual-supply voltages by looking at the lower bound of power consumption in the given circuit. To deal with the possible power penalty due to the level converters at the interface of different supply voltages, we use a “constrained F-M” algorithm to minimize the number of level converters. We have implemented our approach under SIS environment. Experiment shows that the resulting lower bound of power is tight for most circuits and that the predicted “optimum” supply voltages are exactly or very close to the best choice of actual ones. The total power saving of up to 26% (average of about 20%) is achieved without degrading the circuit performance, compared to the average power improvement of about 7% by gate sizing technique based on a standard cell library. Our technique provides the power-delay tradeoff by specifying different timing constraints in circuits for power optimization. Index Terms—Dual-supply voltages, gate level, maximalweighted-independent-set, power optimization, timing constraints.

I. INTRODUCTION

W

ITH the increasing demand for low-power applications, power optimization has been a major goal in designing digital circuits. Since the dynamic power dissipation of CMOS circuits is proportional to the square of supply voltage, reducing the supply voltage promises to be one of the most effective ways to achieve low power. Unfortunately, the reduced supply voltage leads to the speed loss of the logic modules. This situation deteriorates especially as the supply voltage approaches the threshold voltage of the device [1]. To compensate for the reduced speed, one can use parallel and pipelined architectures with the expensive hardware overhead [8]. Alternatively, the speed degradation problem can be overcome by using the lowthreshold voltage which, however, results in a rapid increase in

Manuscript received August 4, 1999; revised February 13, 2001. This work was supported in part by the National Science Foundation (NSF) under Grant MIP-9527389, by DARPA, and by a Grant from Motorola. C. Chen is with the Department of Electrical and Computer Engineering, University of Windsor, ON, Canada. A. Srivastava and M. Sarrafzadeh are with the Computer Science Department, University of California at Los Angeles, Los Angeles, CA 90095-1596 USA. Publisher Item Identifier S 1063-8210(01)03643-5.

the subthreshold current [9], [20]. In either case, an implicit assumption is that the reduced supply voltages are uniformly applied to all logic modules. Instead, the entire circuit performance does not necessarily get worse if the reduced supply voltages are applied only to the logic modules on the noncritical paths. The reason is that these modules typically have high timing slack which may allow the increase of their delay without violating the timing constraints. Experience shows that, for most circuits, logic modules (or gates, at logic level) on critical paths only account for a small portion of all modules. In Fig. 1, for example, we plot the average distribution of gates with different slack for 16 MCNC91 benchmarks after technology mapping (note that the slack value has been normalized to the longest path delay). It can be seen from this figure that, the number of gates on critical paths (i.e., with zero or close-to-zero slack) accounts for only about 14% of total gates, while more than 60% of gates have their slack larger than 0.2. This potentially provides much room for power reduction using the reduced supply voltage. In general, power reduction technique with the reduced voltage (or variable voltage) is called voltage scaling. Depending on how many supply voltages of different value are available, voltage scaling may be classified as multiple voltage approach or dual-voltage approach. Prior work on multiple voltage approach includes [4], [12], [13], and [25]. Most of them basically focus on the scheduling problem for low power at the behavioral level. For example, [12] proposes an optimal scheduling algorithm to reduce The systems power while meeting the timing constraint. In [13], a dynamic programming technique is used to minimize the average energy consumption. However, there are some practical hurdles that need to be overcome before the use of multiple supply voltages. Among them are constrained physical design problems and the resulting area/delay/power penalty due to level converters (LCs), which are required at the interface of different voltages. In contrast, these issues can be eased if only two supply voltages are used [2], [3], [5], [15], [26]. In [3], for example, a layout scheme using two voltages is discussed together with its application to a media processor chip design. In [15], dual-supply voltages were used successfully to design a chip of MPEG4 codec core at Toshiba Corporation. These show the feasibility of implementing dual-voltage circuits. In [2], Usami and Horowitz first proposed a dual-voltage approach based on the so-called cluster-voltage scaling (CVS). The idea is to use the depth-first search from primary outputs to find gates which may operate at low supply voltage without violating the timing constraints. As a result, two clusters are created. One is the cluster consisting of gates with low supply ) on the side of primary outputs, and the other is the voltage (

1063–8210/01$10.00 © 2001 IEEE

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES

617

mance of the circuit, we relate the power optimization to the maximal-weighted-independent set (MWIS) problem and propose a fast heuristic algorithm to predict the optimum supply voltage [21]. Then, based on predicted supply voltages, we develop an effective algorithm to allow as many gates as possible working at . Considering the possible power penalty and , we target minof the LCs at the interface of imizing the number of LCs by using what we call the “constrained Fiduccia–Mattheyses” [6] algorithm. By specifying different timing constraints, the proposed technique is also able to provide the power-delay tradeoff with two supply voltages [22]. The design flow of power optimization with dual voltages is shown in Fig. 2 (other layout structures can be found in [16]). Fig. 1. The average distribution of gates with different slack for 16 benchmark circuits.

cluster of gates with high-supply voltage ( ) on the side of primary inputs. Thus, no LCs are required. In this cluster struconly ture, however, any gate may be selected to operate at after all its transitive fanouts have been selected to do so. Thus, some part of the circuit with high slack may be left to operate , limiting the potential of further power reunnecessarily at duction. To avoid this problem, an extended-CVS structure was first proposed in [5] using the so-called level sort technique and gates may was improved recently in [18]. In this structure, gates. However, because of lack be scattered among the of a global view, the method may not be effective especially when the given timing constraints are tight. In addition, these techniques did not consider the switching activity information within the circuit. Clearly, as long as the switching activity of all nodes in the circuit is available, the more power saving can be reached by allowing logic gates with high switching activity . In this sense, the CVS-based structure is not to operate at preferable for power reduction, since the nodes near primary outputs typically have low switching activity (on average) [23]. More recently, a linear programming approach is presented in [19] to address dual-voltage problem. However, it is based on the so-called delay balanced graph whose number increases exponentially in terms of problem size. No matter what specific algorithm is to be used, different value of two supply voltages can lead to totally different power consumption. This gives rise to another important problem with dual-voltage approach: How to select the value of two supply voltages for maximum power reduction? Existing techniques do not deal with this problem. Since the optimum supply voltage depends on specific circuits, it is straightforward to try a variety of possible voltages for each given circuit and select the best one of them. However, this exhaustive strategy is computationally expensive. A fast prediction algorithm is thus desirable to provide a good starting point toward power optimization with dual voltages. In this paper, we address the problem of reducing the total and power consumption by using two supply voltages ( ) at gate level. Given a technology-mapped network, the power consumption can be modeled more accurately. We analyze the timing slack distribution and explore how to take full advantage of slacks in a given circuit. In order to get the maximum power reduction while maintaining the timing perfor-

II. BACKGROUND A technology-mapped network can be represented as a di. A node corresponds rected acyclic graph to a gate in the network (The terms “gate” and “node” will be used interchangeably throughout the paper). The existence of implies that node is an immea directed edge diate fanin of node (or, node is an immediate fanout of node ). The set of all immediate fanins (fanouts) of is denoted by ( ). If there is a directed path from node to node in , ( ) is said to be a transitive fanin (fanout) of ( ). The set of all transitive fanins (fanouts) of node is denoted by ( ). Each node is associated with a delay , where is the intrinsic delay, is a conis stant dependent on the driving ability of the node, and the loading capacitance at the output of node . The arrival time and required time of node are recursively given by (1) respectively, and the slack time of node is defined to be . With good accuracy [1], the node delay at supply voltage is proportional to , where is the threshold voltage, and is a constant (note that the slew effect is not concan sidered here). The delay of node at any supply voltage be expressed as (2) is the delay of node at . The delay of node where at is expressed as . The delay to is thus increase due to the voltage reduction from . We call the delay gap of node . works at either In our two-voltage approach, each node or . If node works at , in (1) is replaced for the calculation of arrival time, required time and with slack time. In CMOS circuits, the average dynamic power consumption is given by (3)

618

Fig. 2.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

Design flow with dual-voltage technique.

where clock frequency; again the loading capacitance; switching activity at the output of gate . In general, switching activity inside a circuit not only depends on the topologic structure and input patterns of the circuit, but may vary with gate delay which introduces glitching transidoes not account for glitch power. tions. Here, however, The reason is twofold. First, measuring the glitches requires an event-driven simulation and an explicit knowledge of the inputs waveforms. The later hypothesis is unrealistic, and updating the amount of glitching with simulation iteratively is computationally too expensive. Secondly, as we will see in the next section, the proposed technique tends to reach path balancing by reducing the slack without changing the circuit topology. This helps eliminate glitching to some extent. In any case, glitch power can be taken into account during validation by power simulation. Reducing supply voltage results in power saving of the gate. to any supply For gate , changing its supply voltage from generates the power reduction voltage (4) , the power reduction is . We call the power gain of gate . For the purpose of further discussion in the following, we give some definitions first. for Definition 2.1: A circuit (or graph ) is safe if . Otherwise, the circuit is unsafe, meaning that each node it violates the timing constraints. Definition 2.2: A node is called a feasible node if , where is the delay gap of node . Otherwise, the node is called a nonfeasible node. works at It is assumed that initially each node and its slack (i.e., the circuit is safe). Under the given When

timing constraints, any node may be selected to work at for power reduction only if it is a feasible node. In general, however, . The reason is that, not all feasible nodes in can work at , the slack of others (such once a node is selected to work at as the transitive fanins/fanouts of the node) may be reduced. As a result, some feasible nodes may no longer be feasible (see Example 2.1). Example 2.1: Fig. 3 shows an example with six nodes, where and ( ) are the slack and delay gap of node , respectively. Initially, all nodes are feasible. If node is , the new slack values for all nodes will selected to work at , and , making node be and nodes nonfeasible. Instead, if we first select node to work at , the new slack for all nodes turns out to be: , , and , leaving , and still feasible. These feasible nodes can be further . In this sense, considered for the possibility of working at is a better choice than . This example shows that the order of selecting nodes should be considered carefully so that their slacks can be fully exploited for more power saving. In the next section, we will discuss this problem in more details. For any given circuit, our goal is to such that the timing select a subset of gates working at constraints are satisfied and the total power gain is maximized. III. TIMING SLACK ANALYSIS In this section, we explore how to take full advantage of slacks in a circuit by examining the interaction between the node delay and slack changes. First, we provide some definitions. : 1) an edge Definition 3.1: Given a graph is called sensitive if either or . Intuitively, a sensitive edge implies that the slack of and is sensitive to each others delay change; 2) a directed path is called sensitive if: a) the path consists of only sensitive edges and b) the slack of all nodes on the

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES

Fig. 3. Part of a safe circuit with six feasible nodes.

Fig. 4. triplet

path is monotonously distributed1 in the direction of the path; are called slack sensitive if there exists a 3) two nodes sensitive path from to or from to in . Otherwise, they are called slack insensitive. Definition 3.2: The sensitive transitive closure graph of is a directed graph such that there is an edge if and only if there is a directed sensitive path from to in . We clarify the above definitions using the following example. Example 3.1: Fig. 4 shows an example, where the delay of node is 2, while the delay of the rest is assumed to be 1. The arrival, required and slack time for each node are represented by a in the figure. From Definition 3.1, all edges extriplet are sensitive. The sensitive paths cept the directed edge , , and . Nodes , , and include are pair-wise slack sensitive, while , are slack insensitive. The sensitive transitive closure graph of Fig. 4 is shown in to in Fig. 5 since Fig. 5. Note that there is no edge from in Fig. 4 is not a sensitive path. Instead, there is an adin Fig. 5 since there is a directed ditional directed edge , from to in Fig. 4. sensitive path, Suppose the delay of node increases by a small positive , then its slack is reduced by exactly . Obamount does not viously, the slack of any node for any node change. Instead, the slack may or may not change, depending on and the slack sensitivity of and . Consider three cases as shown in Lemma 3.1. Lemma 3.1: Let be a transitive fanin/fanout node of any , and be a small positive constant. We have the node following. , then will decrease by as long Case a) If as and are slack sensitive. , then will decrease Case b) If as long as and are slack sensitive. by or , are slack insensiCase c) If either remains unchanged. tive, then Proof: For simplicity, we give proof of Case a) only. in Case Without loss of generality, assuming , we a). Since and are slack sensitive and , where is a node lying at the sensitive have to . Thus, we only need to prove that Case path from

f

g

1Assuming a directed path consists of v ; v ; . . . ; v . The monotonic slack distribution on the path implies that if s(v ) s(v ), then s(v ) s(v ); if s(v ) s(v ), then s(v ) s(v ), where i = 1; . . . ; m 1.







619

0



j

Example graph with the arrival, required and slack time shown as a , assuming d(v ) = 2, d(u ) = 1, i = 1; 2; 3, and d(w ) = 1, = 1; 2 .

f

Fig. 5.

g

a; r; s

The sensitive transitive closure graph, G of Fig. 4.

a) is true when we argue

is an immediate fanout of . In this case, . [Otherwise, we have both and . This implies . This is a contradiction.] Therefore, increasing the and delay of by results in the increase of both by , while both and remain unchanged. This means and will be reduced by . Similarly, one can both . prove that Case a) still holds true if Let us verify the above lemma using the following example. Example 3.2: Consider Fig. 4. Assume that the delay of node , , increases by . Since , both and are reduced to 0.2, as indicated in Case a). From Case decreases to 0.2. Since , are slack insensitive, b), remains unchanged, as can be expected from Case c). In , we have contrast, by using . In this case, increasing by does not affect the value , even though nodes , are slack sensitive. of To make full use of timing slacks, it is desirable to keep the minimum number of nodes whose slacks are reduced by the increased delay of a node. In this sense, Case c) is the best case. with maxThis motivates us to first select the set of nodes ) in and then construct an inimum slack (denoted by , of (see Definition 3.2) duced subgraph, such that there is an edge if is in . on to denote the set of neighbors of node in . We use and . In Fig. 4, for example, we have of Fig. 5 is shown in Fig. 6, where . There , as given in Lemma 3.2. are two important properties of Lemma 3.2: increases by Property 1) If the delay of any node , then the slack of any node decreases by .

620

Fig. 6. The induced graph

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

G

of Fig. 4.

Property 2) The delay increase of any node by does not affect the slack of any node as long as is small enough. , it follows that Proof: 1) Since and nodes , are slack-sensitive. From Case a), is reduced is. 2) Since , we have either by as or . If , then . Let . If , then remains nodes and are slack-insensitive. In either case, unchanged according to Case c). One can demonstrate Lemma 3.2 using in Figs. 4–6. is independent of , and Lemma 3.2 holds It can be seen that , where is the second true if largest slack of all nodes in . Therefore, a reasonable value for is . From Lemma 3.1, in a given graph , increasing the delay of any node may reduce the slacks of its transitive fanin or fanout nodes, depending on their slack-sensitivity. As will be seen in Section IV, the power consumption of a node can be represented as a monotonic function of its delay. Therefore, it is generally desirable to maximize the sum of poswithout violating the sible increased delay of all nodes in timing constraints. To this end, one can select nodes in the maxfor the delay inimum independent node set2 (MINS) of crease by . Each time the delay of all nodes in MINS increases , and , and find MINS again. This by , we update . Assuming that the circuit process continues until groups of nodes with different slacks, the whole contains passes. Also, MINS-based process can be completed by guarantees that keeps safe at choosing , , each pass. This is because , and from Case a)–c) of Lemma 3.1, . Example 3.3 illustrates the process. Example 3.3: Consider Fig. 4 again. Initially, we have and . The MINS of (see Fig. 6) (or ). Let MINS , and increase the delay is by . We then update of node , and as: , and . is also updated, as shown in Fig. 7 or . Let MINS where MINS is and increase the delay of both node and node by . The process terminates with . As a result, the sum of the final maximum slack , which is increased delay of all nodes is thus the maximum achievable under the given timing constraints. The above process of increasing delay is done by many iterations, depending on the number of nodes with different slacks in the graph. From the optimization point of view, this constitutes a 2A

maximum independent node set of a graph is a set of independent nodes (i.e., no two nodes are connected by an edge) such that the number of nodes in this set is maximum.

Fig. 7. The updated

G

.

first-order gradient approach, where the sum of incremental delays is the objective function to be optimized, the MINS corresponds to a fastest way that the objective is increased, and is the step size used for searching the best solution. To speed up the process, one may wish to solve the problem in one shot by increasing the delay for all nodes in the graph. This can however prevent us from analyzing/formulating the slack-sensitivity between nodes, as described previously in this section. Since slack sensitivity explains how the nodes interact in terms of their delay and slack changes, it is generally not realistic to find an exact solution in one shot where the slack sensitivity information is unavailable. IV. LOWER BOUND OF POWER CONSUMPTION In the previous section, we obtained the basic scheme of selecting nodes for delay increase by analyzing the relation between the slack and delay of nodes. The goal is to maximize the sum of all possible delay increase for all nodes in the circuit while maintaining the timing performance. However, maximizing the sum of all possible delay increase does not necessarily lead to maximum power reduction. The reason is twofold. First, the voltage reduction and, hence, power reduction under a unit penalty of node delay can be different for different nodes. Second, under dual-voltage environment, only two discrete supply voltages are available for all nodes. In this section, we deal with the former issue by introducing the notion of weight function and transforming the delay increase into power reduction. The latter will be discussed in Section V. A. Lower-Bound Algorithm To tackle low-power design using the slack and delay of nodes, let us examine how the node delay affects the power consumption. First, assume (for the purpose of establishing a lower bound) that initially the given circuit is safe and that the supply voltage can take any continuous value. From (2) and (4) , the power reduction of node under at any supply voltage unit delay penalty is given by

(5) the weight function of node . Note that We call not only depends on and , but also varies with the

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES

supply voltage and, hence, the delay of node . This leads to the following considerations. 1) Since the weight function denotes the power reduction by increasing a unit delay on a node, we need to select nodes in MWIS3 instead of MINS for the delay increase of so that the maximum power reduction can be obtained. 2) Since the weight function is related to the node delay, a small value of is generally preferable. Theoretically, is required to approach zero for the maximum power saving. In the real world, however, smaller is not always better. First, too small value of can lead to prohibitively expensive computation cost. Second, since MWIS depends , the small change on the relative weight of nodes in of weight for specific nodes will not necessarily affect MWIS. This is true especially when the weights of nodes are distributed in a wider range. In our experiment in part, we will show that for most circuits, the results are . reasonably good when using In the above discussion, we assumed the supply voltage is a continuous variable. Instead, when only dual-discrete supply voltages are available, the actual power reduction is higher than it is with continuous voltage. From this point of view, our approach provides a lower bound of power consumption with dual voltages. The procedure of our algorithm is outlined below: Lower-Bound-Algorithm { input: output: supply voltages for all nodes and lower bound of the total power Calculate node delay, slack and weight under function for each node in using (1), (2) and (5); and in and let Identify while { and construct ; Find ; Find MWIS of for each Increase node delay by node in MWIS; Update the node slack, voltage, weight and ; } Calculate the final supply voltage, , for each node in using (2); Obtain the lower bound of power consumption using (4); } It should be noted that the MWIS problem is NP-complete on general graphs [11]. It is, however, polynomial-time solvable for transitive graphs [11]. A fast heuristic for finding MWIS is as follows. Initially MWIS is set to be . Then choose a node with maximum among all nodes in . Add node to MWIS and delete and all its neighbors (along with 3A maximal weighted independent set of a graph is a set of independent nodes (i.e., no two nodes are connected by an edge) such that the sum of their weights is maximum.

621

all edges incident with at least one of the nodes) from . This . An exact and polynomial-time process repeats until algorithm can be found in [11] and [14], respectively. B. Prediction of “Optimum” Supply Voltages It is interesting to look at the effect of low supply voltage on the delay and power reduction. For each gate, the reduced promises more power saving at the cost of increased delay of the gate. Under the same timing constraints, using the lower means that the fewer gates are permitted to work with. Therefore, one can expect there is a specific “optimal” value of supply voltage for the total power reduction, depending on the given circuits. Unfortunately, the existing dual-voltage algorithms work with the fixed supply voltages. These algorithms cannot help if the given supply voltages are far from their optimum. Also, as mentioned in Section I, an exhaustive approach for selecting optimum supply voltages is computationally expensive. This motivates us to find a fast prediction of “optimal” dual-supply voltages. When the lower-bound algorithm terminates, the supply is obtained for each node in . In order to voltage , meet the given timing constraints, the supply voltage, . Under selected for node has to be kept greater than or can be used for dual-voltage environment, either . each node. It is reasonable to select can be estimated by finding The “optimal” value of

where

if

(6)

if

This can be done by calculating the specific summation in ranging from (6) for different values of to and selecting the maximum sum. Equation is equal to (6) cannot take its maximum value unless for a specific node . We prove this below by way of contradiction. Without loss of generality, suppose we have and (6) takes its maximum value

when , where . Note for , and for . On the other that , the value of (6) becomes hand, when

Since , we have . This contradicts the is the maximum value. previous hypothesis that The estimate given by (6) is optimal in the sense that any deviation from it will either increase power consumption or violate and are the timing constraints. When the optimum available, we modify lower-bound algorithm for dual-voltage power optimization as follows. Each time the delay of nodes in

622

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

MWIS increases by , we check if some of these nodes are able . If yes, select them to work at , and update to work at their weight to zero. The reason is that, once a node works at , no further increase of its delay is needed. By doing so, more delay budget can be provided for other nodes. The formal description of power optimization algorithm based on predicted optimum supply voltages will be given in the next section. V. POWER OPTIMIZATION WITH DUAL-SUPPLY VOLTAGES In this section, we describe an effective algorithm for power optimization using dual-supply voltages. Since a level converter is required at the interface of two voltages, we also discuss the reduction of the level converters’ power penalty. A. Power Optimization Algorithm Using Dual-Supply Voltages Before presenting our algorithm, we take a look at the performance of existing techniques for power optimization with two supply voltages. Example 5.1 shows how these techniques work on Fig. 3. Example 5.1: One of existing techniques is CVS scheme as mentioned in Section I. Assuming, in Fig. 3, that is a primary , of is as foloutput and the power gain, , , , lows: and . The CVS algorithm [2] starts works at , with selecting the primary output . Once the slack of its immediate predecessor, , changes from three cannot work at to zero. To meet the timing constraint, and the algorithm terminates. The power reduction is two. Another technique available is zero-slack-algorithm (ZSA) [10]. The basic idea behind ZSA is that, at each step, it first finds the nodes with minimum positive slack and then selects some such that their slacks become zero (or of them to work at small enough in this application). At the first step, since node has maximum power gain of three, is selected to work at with the power gain of three. At the second step, nodes , and have the minimum positive slack of two. Because three of them are nonfeasible, the algorithm stops with the final power reduction of three. As a matter of fact, however, the optimal solution of Fig. 3 is . That will result in the to select , , , and to work at maximum power reduction of six while keeping the circuit safe. Both CVS and ZSA break down because they select the nodes locally (i.e., in a very greedy fashion) and ignore the interaction between the delay and slack changes of nodes. Thus, a more effective algorithm, with a global view, is required to tackle this problem. Based on the discussions in Section IV, we next outline our algorithm for power optimization. Dual-Voltage-Power-Optimization (DVPO) Algorithm { input: output: The set of gates, Step 1: Predict the “optimum” supply voltages by calling lower-bound-algorithm ( ), unless they have been given by the users;

Step 2: Let and ; Step 3: while ( ) { of the induced subFind , of on a set of nodes with graph, maximum slack; for all Increase node delay by nodes in MWIS; is so large if the delay of node , that it is able to work at , then ; and the Update the delay, slack , ; weight function of node , endif } } We now apply the dual-voltage power optimization (DVPO) algorithm to Fig. 3, as shown in Example 5.2. For comparison, it is assumed that the supply voltages are the same as in CVS and and ZSA algorithms and, hence, the prediction of is not included in this example. Example 5.2: Consider Fig. 3 again. The weight functions , for all nodes are approximated as 1 6. We have , , , , and . In the first iteration , , , of DVPO algorithm, , , MWIS , and , . No nodes are put into since , . The slack values of all nodes , and . are updated as: , , , In the second iteration, , MWIS , and , . The delay of nodes , and are large enough to be put into , i.e., . , , The new slacks are: , and are updated to be zero. and , , , In the third iteration, , MWIS , and , . Thus, node is added into . . As a result, we get Finally, the algorithm terminates with with maximum the optimal solution of power reduction of six. The time complexity of DVPO algorithm is analyzed as follows. The time complexity of lower-bound-algorithm is basically the same as that in Step 3 of DVPO. Finding the maximum value requires time, where is the number of of takes nodes in . Estimation of the “optimal” value of time, since at most values need to be tried for finding the maximum sum in equation (6). Therefore, the prediction of optimum time. Calculating the delay, supply voltages just needs time. Finding slack and weight function for all nodes takes also needs time. The MWIS of can be obtained in time because . Since MWIS , and linear time is required to update the slacks and weight functions and for each node MWIS, of all nodes in time. Thereeach iteration in Step 3 can be completed in fore, assuming the number of iterations is , the time complexity

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES

K

n n

K

Fig. 8. The comparison of and for 16 benchmarks. ( —the number of groups of nodes with different slack; —the number of nodes.)

of the whole DVPO algorithm is . When selecting at each iteration, we have , where is the number of groups of nodes with different slack. Our expefor most circuits, making the computation rience shows efficient. Fig. 8 is the comparison of and for 16 MCNC’91 , on average. benchmark circuits, where B. Level Converter Power Minimization Algorithm When using two supply voltages, the circuit requires level gates and gates converters (LCs) at the interface of gate drives a to block the static current which occurs if a gate [2]. A conventional LC circuit is shown in Fig. 9. In the above discussion, we did not account for the effect of the LC. Since an LC generates the additional area, delay, and dynamic power to the circuit, the number of LCs should be minimized. From Fig. 9, the LC consists of 6 transistors and, hence, its area can be seen as the area of a three-input NAND gate. In the DVPO ) can easily be taken algorithm, the delay of LC (denoted by with as into consideration by replacing the delay gap of node if an LC is needed at the output of . can be seen to be a constant dependent on the In general, of 0.5 ns is used specific technology (in our experiment, the as in [2]). After the DVPO, we have two partitions of gates: one gates (denoted by ) and the other with gates with (denoted by ). Assuming that the input capacitance of an LC and that the switching activity at node where the LC is is , the additional power consumption due to the inserted is . LC can be approximated by Now we examine the problem of reducing the total power or from to . consumption by moving a gate from to Initially, because of DVPO, it is less likely to make any move of to which may violate the timing constraints a gate from due to the increased delay of the gate. Let us first look into the possibility of moving a gate from to . Without loss of -type fanins and generality, consider an -type gate with -type fanouts. The possible power reduction resulting from the move of node from to is given by

Fig. 9.

623

The conventional level converter.

self moving from to . Thus, the problem of maximizing power reduction can be solved by constrained CF–M algorithm. The constrained means that a move is accepted only if it does not violate the given timing constraints. Although more effective cluster-based F–M (like hMetis [24]) algorithms are available for the general partitioning purposes, they cannot be used in this application where nodes are required to move individually without violating the timing constraints. Our algorithm for minimizing LCs power penalty is described in the following. Level-Converter-Power-Minimization (LCPM) Algorithm { input: Circuit with the initial partiand tion of output: Power optimized circuit with final partition repeat ; gain_sum ; do for each gate ; compute gain such that choose (gain ); gain from if the tentative move of to does not violate the timing constraints then ; gain_sum gain_sum gain ; endif endfor such that Find (gain_sum ); gain_sum gates from to Move the first if ; gain_sum ; until gain_sum } is complete, a After the process of gate move from to similar algorithm is applied to the possible move of gate from to where the cost function is modified as

(7) where the first term is the reduced power due to LCs and the second term accounts for the increased power of the node it-

(8)

624

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

TABLE I THE LOWER BOUND OF POWER CONSUMPTION

Fig. 10.

Examples of gate moves. (a) From L to H. (b) From H to L.

Fig. 10 shows typical examples of the possible gate moves. In Fig. 10(a), gate L is to be moved from to so that the number of LCs is minimized [see (7)]. In Fig. 10(b), in order to reduce to if it the number of LCs, gate H is to be moved from does not violate the timing constraints [see (8)]. Obviously, the time complexity of LCPM algorithm is the same as that of the time for each traditional F–M algorithm [6] which needs pass, where is the number of terminals in the circuit. Since the power overhead of LCs is not considered in lowerbound algorithm in Section IV, our lower bound estimate of power dissipation may not be tight, depending upon specific circuits. Ideally, one has to try every possible set of dual voltages exhaustively for power optimization before deriving the real optimum voltages. Obviously, this is prohibitively costly. Also, it is too complicated, if not impossible, to deal with LCs and best supply voltages in an integrated problem formulation. Hence, we have divided this difficult problem into two subproblems which can be solved easier. We first estimate “optimum” voltages by lowerbound algorithm assuming no power overhead for LCs, and then minimize the number of LCs by LCPM algorithm. As we will see in the next section, the lower bound of power consumption deviates the actual minimum power by about 6% on average (the readers are referred to Table I in Section VI). VI. EXPERIMENTAL RESULTS AND DISCUSSIONS We implemented our dual-voltage approach in the environment of SIS package [7], as shown in Fig. 2, and tested it on a set of MCNC91 benchmark circuits. The technology-mapped network was obtained using SIS under the delay mode. We used the longest path delay of the minimum delay implementation as the timing constraints in our algorithm. The original power consumption was estimated at supply voltage of 5 V and clock frequency of 20 MHz under 1 m technology with the threshold voltage of 0.6 V. Our experiments are divided into five parts. In the first part, we estimated the lower bound of power consumption and optimum supply voltages, and compare the predicted optimum voltage with the actual one. In the second part, we compare our dual-voltage approach with gate sizing technique (using a standard cell library). The performance of our approach

and CVS is presented in the third part, and the power-delay tradeoff is provided in the fourth part. Finally, we show our power optimization results with different input statistics in the fifth part. A. Lower Bound of Power and “Optimum” Supply Voltages Table I shows the lower bound of power consumption using and its comparison with actual minimum power. For individual circuits, the lower bound ranges from 54.5% to 77.9% of actual power consumption. On average, the lower bound accounts for about 66% of the actual power, compared to 71.5% for actual minimum power. To test the “optimum” voltage estimation, we optimized the same circuits with . Table II lists DVPO algorithm using different values of these test data. Depending on specific circuits, the “optimum” ranges from 2.2 V to 3.5 V (here only “optimum” value of was given because is 5 V under the minimum delay constraints). It can be seen that the maximum power saving was (as shown in shaded box of achieved at a specific value of Table II) which is exactly, or very close to, the estimated “optimum” voltage (as shown in the last column of Table II). Although, for a few circuits (e.g., circuit frg2), the estimate and actual optimum voltage are much different, the resulting power reduction is nevertheless always very similar. Note that, in some cases (e.g., circuit 9symml), the lower bound of power can be much smaller than the actual minimum power consumption. The reason is that, for this circuit, the final supply voltages ( ) resulting from lower bound algorithm scatter in a wider range because of the very nature of the circuit. To look at how the value of affects the performance of the lower-bound algorithm, we tested it on different fixed values implies it varies dynamof (note that ically). Fig. 11 shows the resulting lower bound of power and CPU time (on a SUN SPARCstation 5 with 32 MB RAM) for the same circuits as above. For most circuits (except one), as shown in Fig. 11(a), the lower bound varies within the range of 3% for different , indicating its insensitivity to . In contrast, CPU time increases with decreased , as shown in Fig. 11(b). Our experiment shows the further reduction of results in too expensive computation cost only with almost the same lower bound estimate. We confirm that it is reasonable to choose

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES

625

TABLE II THE OPTIMUM SUPPLY VOLTAGE AND POWER REDUCTION USING DIFFERENT VALUES OF V (THE SHADED ENTRIES CORRESPOND TO THE ACTUAL VOLTAGE WITH MAXIMUM POWER REDUCTION)

(a)

(a)

(b)

(b)

Fig. 11. Performance of lower-bound algorithm on a set of circuits. (a) Lower bound on power consumption. (b) CPU time.

in terms of both accuracy and efficiency of the algorithm. Also, we tested our algorithm under different timing constraints. Fig. 12 shows the lower bound of power and optimum , supply voltage given the timing constraints of and , where is the minimum delay of the circuit. As can be expected, the lower bound of power decreases as the , however, timing constraints are relaxed. The “optimum” basically stays at 5 V under various timing constraints. What is interesting is that looser timing constraints do not always lead , as shown in Fig. 12(b). to the lower “optimum” value of

Fig. 12. The estimated lower bound of power consumption and “optimum” supply voltages under different timing constraints. (a) Lower bound on power consumption. (b) Optimum supply voltage.

This is because the timing penalty increases quickly when the supply voltage of a gate is reduced to the smaller value. B. Comparison of Our Dual-Voltage Approach and Gate Sizing Technique In order to compare our dual-voltage approach with gate sizing technique, we implemented a gate sizing technique [17] which is also based on the MWIS. Table III summarizes the V and V) using a standard results (with cell library. The average power reduction over tested circuits

626

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

TABLE III POWER REDUCTION (%) COMPARISON OF DUAL-VOLTAGE TECHNIQUE AND GATE SIZING TECHNIQUE

slack drastically increases after optimization. For circuit i3, in contrast, only 6 out of 310 gates work at and, hence, the slack change of the gates is much smaller, resulting in very limited power saving (only 0.1%). D. Power-Delay Tradeoff

is 6.9% for gate sizing and 19.6% for dual-voltage approach. When using the optimum supply voltage, more power saving is possible with dual-voltage technique. However, it should be pointed out that for dual-voltage technique, there will be some power overheads at physical level, which could not be accounted for at gate level. More recently, an approach that uses gate sizing and dual-voltage techniques simultaneously for low power has also been reported [17], [27]. C. Performance of Our Approach Compared With CVS For comparison with CVS which is the existing dual-voltage approach, we also implemented CVS under SIS environment and tested it on benchmarks. Table IV shows the comparison of V and V were our algorithm and CVS ( used for this experiment). Columns 2, 3, and 4 of this table are the number of gates in the circuit, the circuit delay and initial power consumption (i.e., with supply voltage of 5 V), respectively, before running the algorithms. With our algorithm, the in the circuit, on average, acnumber of gates working at counts for 65.7% of the total number of gates. Individual percentage ranges from 34.6% to 86.3%, depending on the specific circuits. The average power penalty due to the LCs is about 5%. Our algorithm achieves a total power reduction of 19.6%, on average, with the maximum reduction of 26.3% for circuit C880. In contrast, only 11.4% average power reduction is obtained by CVS. A typical instance is circuit 9symml which has only one primary output. Under the timing constraints of minimum circuit delay, the slack of the primary output is zero and no power reduction is possible by using CVS. Instead, our algorithm gets 15.4% power improvement for this circuit. Note that all results in Table IV were achieved without degrading the timing performance of the circuit. This demonstrates the great potential obtained by our dual-voltage power optimization. The last column of Table IV shows the CPU time running on a SUN SPARCstation 5 with 32 MB RAM. To see how the excessive slacks in the circuit contribute to power reduction, we conducted experiments on the slack change of all gates resulting from our algorithm. Fig. 13 shows the distribution histogram of gates over the slack value for two circuits: c8 and i3. For circuit c8 whose longest path delay is 13.2 ns, 45 . The number of gates with small out of 130 gates work at

To provide the power-delay tradeoff, we specified the different timing constraints on the given circuits for our power optimization. As an example, the power-delay tradeoff curves for two circuits (C880 and apex6) are shown in Fig. 14, where and are the two extreme points. corresponds to the minimum timing constraint with relatively less power reduction, while stands for relatively loose timing constraint with the possible maximum of power reduction which is reached . Using V when all gates of the circuit work at V, the maximal percentage of power reduction and %. However, should be theoretically , the actual minsince the primary inputs always work at imum is higher. As shown in Fig. 14, the actual minimum power for circuit C880 and apex6 are 62.8% and 66.1% of the initial total power, respectively, and the power consumption no longer decreases when the delay (i.e., the given timing constraint) increases further beyond point . In the above discussions, we are assuming the capacitive loading of a gate remains the same whether it works at or . In the real design, however, the gate with has , indicating that less parasitic capacitance than that with gates tends to be pessimistic. In our power estimation for this sense, the actual power saving can be a little higher than estimated here. E. Power Savings Under Different Input Statistics In the above experiments we assume that the input activity is 0.5 for all circuits. Considering the fact that the actual input conditions may not be known, it is interesting to look at how sensitive our optimization results are to input statistics. While the proposed technique does not require any specific input patterns for optimization, the weight function given by (5) does , which depends on the input vary with switching activity, statistics. Thus, input activity affects the optimal dual voltages, or and, hence, the power final assignment of nodes to optimization results. For a case study, we use the random input activity ranging from zero to one for circuit 9symml, and run our dual-voltage optimization algorithms. The results are plotted in Fig. 15. Fig. 15(a) shows the curve of initial power consumption versus random input activity, and Fig. 15(b) shows the power saving using our approach with random input activity. From Fig. 15(a), the power consumption of this circuit changes a lot due to different input patterns, ranging from 295.1 to 1367.2. From Fig. 15(b), however, the power reduction varies from 15.9% to 17.6%, showing that percentage power saving is not sensitive to input statistics. This is a desirable feature of our approach. Our experiments with other benchmarks showed the similar feature. We believe that percentage power reduction is dominated by other factors (such as timing constraints and circuit topology which affect the slack distribution) instead of input statistics.

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES

627

TABLE IV PERFORMANCE OF OUR ALGORITHM AND CVS ALGORITHM ON A SET OF BENCHMARK CIRCUITS

(a)

(a)

(b)

(b) Fig. 13. i3.

Gate distribution over the slack value for (a) circuit c8 and (b) circuit

VII. CONCLUSION AND FURTHER WORK In this paper, we targeted gate level power optimization with dual-supply voltages. We have shown that dual-voltage approach can achieve significant power saving without degrading timing performance of the circuit. We demonstrated that it is important to predict optimum supply voltages before using dual-voltage technique and that the power penalty due to level converters can be wellcontrolled.Byspecifyingdifferenttimingconstraints,weobtained a power-delay tradeoff with dual voltages. In this work, we used a simpler delay model which does not account for interconnection delay and signal slew effect. Inter-

Fig. 14.

Power-delay tradeoff for (a) circuit C880 and (b) circuit apex6.

connection delay cannot be reasonably estimated until the physical design phase, while the slew effect depends strongly on the specific value of dual-supply voltages applied. In addition, the delay given by (2) is based on the assumption that the input is also . However, voltage of a gate at supply voltage gate drives a gate, the delay of the gate when a instead is more accurately proportional to . Therefore, in this case, (2) tends to overof estimate the delay. With these issues in mind, further work is required to develop more accurate delay model with dual voltages. Besides, there is the additional noise due to cross-coupling

628

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

(a)

(b) Fig. 15. Power consumption for circuit 9symml under different input statistics. (a) Initial power versus random input activity. (b) Power savings versus random input activity.

between signals with different voltages. Noise checks are thus increasingly necessary during layout. More recently, we are combining gate level design with physical-level for power optimization with dual-supply voltages. Since some important parameters (such as wiring capacitance) cannot be available at gate level, some kind of feedback between logic and physical levels is required in order to know the exact effect of dual-voltage approach on physical level. In the logic level, dual-voltage technique gives an estimate of gate power reduction. In the physical level, we can use simulated annealing with additional constraints for placement and routing followed by some post processing with an ultimate goal of reducing power consumption due to both gates and interconnections [26]. Further work is under way.

ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their comments, which improved the quality of the paper.

REFERENCES [1] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power CMOS digital design,” J. Solid-State Circuits, vol. 27, no. 4, pp. 473–484, April 1992. [2] K. Usami and M. Horowitz, “Cluster voltage scaling technique for lowpower design,” in Proc. Int. Symp. Low Power Design (ISLPD), Dara Point, CA, Apr. 1995, pp. 3–8.

[3] M. Igarashi et al., “A low power design method using multiple supply voltages,” in Proc. Int. Symp. Low Power Electronics and Design (ISLPED), Monterey, CA, Aug. 1997, pp. 36–41. [4] S. Raje and M. Sarrafzadeh, “Variable voltage scheduling,” in Proc. Int. Symp. Low Power Design (ISLPD), Dara Point, CA, Apr. 1995, pp. 9–14. [5] K. Usami et al., “Automated low power technique exploiting multiple supply voltage applied to a media processor,” in Proc. Custom Integrated Circuit Conf. (CICC), Santa Clara, CA, May 1997, pp. 131–134. [6] C. M. Fiduccia and R. M. Mattheyses, “A linear time heuristic for improving network partitions,” in Proc. IEEE/ACM Design Automation Conference (DAC), Las Vegas, NV, June 1982, pp. 175–181. [7] E. M. Sentovish et al., “SIS: A system for sequential circuit synthesis,” Univ. California, Berkeley, Tech. Rep. UCB/ERL M92/41, May 1992. [8] A. P. Chandrakasan and R. W. Brodersen, Low-Power CMOS Digital Design. Norwell, MA: Kluwer, 1995. [9] S. Mutoh et al., “1-V power supply high-speed digital circuit technology with multithreshold voltage CMOS,” IEEE J. Solid-State Circuits, vol. 30, pp. 847–853, Aug. 1995. [10] R. Nair, C. L. Berman, P. S. Hauge, and E. J. Yoffa, “Generation of performance constraints for layout,” IEEE Trans. Comput.-Aided Design, vol. 8, pp. 860–874, Aug. 1989. [11] R. H. Mohring, Graphs and Orders: The Role of Graphs in the Theory of Ordered Sets and Its Application, I. Rival, Ed. New York: D. Reidel, May 1984, pp. 41–101. [12] S. Raje and M. Sarrafzadeh, “Scheduling with multiple voltages,” Integration VLSI J. 23, pp. 37–59, 1997. [13] J. M. Chang and M. Pedram, “Energy minimization using multiple supply voltages,” IEEE Trans. VLSI Syst., vol. 5, pp. 1–8, Dec. 1997. [14] D. Kagaris and S. Tragoudas, “Maximum independent sets on transitive graphs and their applications in testing and CAD,” in Proc. Int. Conf. Computer-Aided Design (ICCAD), San Jose, CA, Nov. 1997, pp. 736–740. [15] K. Usami et al., “Design methodology of ultra low-power MPEG4 codec core exploiting voltage scaling techniques,” in Proc. ACM/IEEE Design Automation Conf. (DAC), San Francisco, CA, June 1998, pp. 483–488. [16] C. Yeh, Y. Kang, S. Shieh, and J. Wang, “Layout techniques supporting the use of dual supply voltages for cell-based designs,” in Proc. ACM/IEEE Design Automation Conf. (DAC), New Orleans, LA, June 1999, pp. 62–67. [17] C. Chen and M. Sarrafzadeh, “Power reduction by simultaneous voltage scaling and gate sizing,” in Proc. Asia and South Pacific Design Automation Conf. (ASPDAC), Yokohama, Japan, Jan. 2000, pp. 333–338. [18] C. Yeh, M. Chang, S. Chang, and W. Jone, “Gate-level design exploiting dual supply voltages for power-driven applications,” in Proc. ACM/IEEE Design Automation Conf. (DAC), New Orleans, LA, June 1999, pp. 68–71. [19] V. Sundararajan and K. K. Parhi, “Synthesis of low power CMOS VLSI circuits using dual supply voltages,” in Proc. ACM/IEEE Design Automation Conf. (DAC), New Orleans, LA, June 1999, pp. 72–75. [20] L. Wei, Z. Chen, K. Roy, M. C. Johnson, Y. Ye, and V. K. De, “Design and optimization of dual-threshold circuits for low-voltage low-power applications,” IEEE Trans. VLSI Syst., vol. 7, pp. 16–23, Mar. 1999. [21] C. Chen and M. Sarrafzadeh, “Provably good algorithm for low power consumption with dual supply voltages,” in Proc. Int. Conf. Comput.Aided Design (ICCAD), San Jose, CA, Nov. 1999, pp. 76–79. [22] C. Chen and M. Sarrafzadeh, “An effective algorithm for gate-level power-delay tradeoff using two voltages,” in Proc. Int. Conf. Computer Design (ICCD), Austin, TX, Oct. 1999, pp. 222–227. [23] M. Nemani and F. N. Najm, “Toward a high level power estimation capability,” IEEE Trans. Comput.-Aided Design, vol. 15, pp. 588–598, June 1996. [24] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel hypergraph partitioning: Applications in VLSI design,” in Proc. ACM/IEEE Design Automation Conf. (DAC), Anaheim, CA, June 1997, pp. 526–529. [25] M. Sarrafzadeh and S. Raje, “Scheduling with multiple voltages under resource constraints,” in Proc. Int. Symp. Circuits Syst. (ISCAS), Orlando, FL, May 1999, pp. 350–353. [26] A. Nayak, P. Banerjee, C. Chen, and M. Sarrafzadeh, “Power optimization issues in dual voltage design,” in Proc. Int. Conf. Chip Design Automation (ICDA), Beijing, China, Aug. 2000, pp. 99–105. [27] A. Nayak, M. Haldar, P. Banerjee, C. Chen, and M. Sarrafzadeh, “Power optimization of delay constrained circuits,” J. VLSI Design—Special Issue Low Power System Design, to be published.

CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES

Chunhong Chen (M’99) received the B.S. and M.S. degrees in electrical engineering from Tianjin University, Tianjin, China, and the Ph.D. degree in electrical engineering from Fudan University, Shanghai, China. From 1986 to 1996, he was with Zhejiang University of Technology, Hangzhou, China, as an Assistant/Associate Professor. From 1997 to 1998, he was a Research Associate at the Hong Kong University of Science and Technology, Hong Kong. From 1999 to 2001, he was a Postdoctoral Fellow, first at Northwestern University, Evanston, IL, and then at the University of California, Los Angeles. Since then, he has been an Assistant Professor at the University of Windsor, ON, Canada. His current research interests include physical layout, logic synthesis, timing analysis, and power optimization for integrated circuits.

Ankur Srivastava (S’00) received the Bachelor of Technology degree in electrical engineering from the Indian Institute of Technology, Delhi, India, in 1998 and the M.S. degree in computer engineering from Northwestern University, Evanston, IL, in 2000. He is currently working toward the Ph.D. degree at the Computer Science Department, University of California, Los Angeles, under the guidance of Prof. M. Sarrafzadeh. He has served as an Intern at Fujitsu Laboratories of America, Sunnyvale, CA, during the summer of 1999. His primary research interests include delay and power optimization in logic circuits, reconfigurable computing, and power issues in sensor networks.

629

Majid Sarrafzadeh (M’87–SM’92–F’96) received the B.S., M.S., and Ph.D. degrees in 1982, 1984, and 1987, respectively, from the University of Illinois at Urbana-Champaign in electrical and computer engineering. He joined Northwestern University, Evanston, IL, as an Assistant Professor in 1987. In 2000, he joined the Computer Science Department at University of California at Los Angeles (UCLA). He has collaborated with many industries in the past ten years, including IBM and Motorola and many CAD industries. He contributed to Theory and Practice of VLSI Design. He has published approximately 200 papers, is a Co-Editor of Algorithm Aspects of VLSI Layout (Singapore: World Scientific, 1994), coauthor of An Introduction to VLSI Physical Design (New York: McGraw Hill, 1996), and the author of an invited chapter in the Encyclopedia of Electrical and Electronics Engineering in the area of VLSI circuit layout. He is on the editorial board of the VLSI Design Journal and an Associate Editor of ACM Transaction on Design Automation (TODAES). His recent research interests include embedded and reconfigurable computing, VLSI CAD, and design and analysis of algorithms. Dr. Sarrafzadeh received the NSF Engineering Initiation award, two Distinguished Paper Awards in the International Conference on Computer-Aided Design (ICCAD), and the Best Paper Award in the Design Automation Conference (DAC) for his work in the area of physical design. He has served on the technical program committee of numerous conferences in the area of VLSI Design and CAD. He has served as committee chair of a number of these conferences, including International Conference on CAD and International Symposium on Physical Design. He was the General Chair of the 1998 International Symposium on Physical Design. He is an Associate Editor of IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN.