PowerPlanningDL: Reliability-Aware Framework for On-Chip Power Grid Design using Deep Learning

2020·arXiv

Abstract

I. INTRODUCTION

The primary objective of the power planning phase in the backend-design of a System-on-chip (SoC) is to design a power grid network which can deliver power to all the components of the SoC within the allowed margin of IR drop and Electromigration (EM) for the durability of the chip. If these margins are not satisfied, then IR drop and EM violation can occur, which reduces the reliability of the chip. Designing a reliable power grid is an iterative process which requires many phases of incremental design to verify the power grid, as shown in Fig. 1. As a result of this, the design cost and power planning sign-off time increases. Therefore, to reduce the cost and the design cycle time, in this work we propose to utilize the historical data of the power planning design cycle and come up with a deep learning model which can generate a reliable power grid. Adaptation of our deep learning model in the power planning phase within the electronics design and automation (EDA) industry reduces cost and increases the efficiency of the total design phase of the chip.

• Present a power planning methodology using the Deep Learning Approach in the VLSI Physical design cycle.

• We present a new aspect of obtaining a similarity between power grid design and deep learning. We also build a reliability-aware framework for power grid design using deep learning.

• We demonstrate speedup in power grid design using the proposed framework compared to the conventional approach for power grid designs of IBM processor. At the VLSI Physical Design level, we answer the following questions: 1) How much practically feasible Deep Learning is for the Power Planning phase? 2) How accurately can the Deep Learning approach predict different design parameters, while still satisfying the allowed IR drop and EM margin? 3) What is the efficiency of the Deep Learning approach compared to the standard power planning tools? These are the fundamental questions that need to be addressed for the successful adaptation of Deep learning approach in the power planning phase.

The paper is arranged as follows. Section II contains all the necessary preliminary details and motivation of the manuscript. Section III shows the nonlinear formulation of the power grid design problem and its equivalence with deep learning training, which is used for solving the power grid design problem. Section IV contains the proposed framework. The experimental results are listed in Section V. The paper is concluded in Section VI.

II. PRELIMINARIES AND MOTIVATION

A. Fundamentals of Power Planning

Fig. 1. Conventional Power Planning Flow in VLSI Physical Design

Power Planning is one of the most critical stages in VLSI Physical design. The conventional power planning steps are shown in Fig. 1. Power planning starts with the pin placement phase of the power and ground pads. Power network is generated in order to provide power to standard cells and macros within the acceptable IR-Drop margin. Steady-state IR Drop occurs due to the resistance of the metal wires of the power grid network. IR drop can be reduced by decreasing the voltage differences between different nodes, which is determined by the power grid analysis. Early vectorless power grid analysis is done in order to find the IR drop even before the placement and routing stage with the power information from the front end design. Once the margin of IR drop limit is satisfied in this stage, then the placement and routing are done. Subsequently, vectored power grid analysis is performed with the exact current traces of the underlying functional blocks in order to satisfy the IR drop margin. This work is a first-of-its-kind using a deep learning approach and focuses on the static IR drop and EM-aware power grid design. Therefore, this work does not consider the decoupling capacitor (decap) placement phase.

B. Related Work

1) Conventional Approaches in Power Grid Design: There

are many works in the literature in last two decades which deals with power grid designs, analysis, optimization and verification using different heuristics. Some of the recent works on the power grids are discussed here. Fawaz et al. [1] have proposed a methodology for accurate verification of the power grids. Wang et al. [2] have proposed electromigrationaware power grid design. Dey et al. [3] have done power grid design considering IR drop and EM reliability constraints. Heo et al. [4] have done IR drop mitigation by inserting power staple. All the methods mentioned above suffer from large convergence time.

2) Learning Approaches in Power Grid Design: There

are very less efforts for the application of learning-based methods in power grid design. However, few closely related works are discussed here. Cui et al. [5] proposed a machine learning technique for power grid analysis by doing matrixreordering. Fang et al. [6] proposed machine learning-based dynamic IR drop prediction. Liu et al. [7] proposed power supply noise aware circuit test timing prediction using machine learning. Chang et al. [8] in their work proposed to generate routability-driven power grid network using machine learning techniques. Lin et al. [9] proposed IR drop prediction of ECO-revised using machine learning. Ye et al. [10] proposed the voltage droop mitigation using support vector deep. Cao et al. [11] proposed a learning-based method to predict the quality of power grid network package. There is not much significant work in the literature on the deep learning-based power planning methodology.

C. Motivation

Designing a power grid is similar to solving a non-linear optimization problem, which is proved in the next section. Similarly, training deep neural networks is considered as solving a non-linear optimization problem. Therefore, we try to investigate the underlying similarity between the two problems and try to solve the power grid design problem using deep learning. Apart from that, deep learning has been successful in predicting complex tasks in many areas of science and technology. Therefore, we use deep learning for prediction of the power grid design, which reduces the design cycle time and dependence of human intervention for the initial design of the power grid.

D. Overview of the Proposed Methodology

Our objective here is to reduce the iterative flow of the power planning phase while still satisfying the allowed margin of IR drop and Electromigration with the help of the historical data generated in the design process of the power grid network. Therefore, initially, we perform the feature extraction and prepare the training data using these historical data of the design phase and specifications, as shown in Fig. 2. Subsequently, we train our deep learning model using these historical data and predict a power grid design for any new design specifications.

Fig. 2. Proposed Deep Learning-based Power Planning Flow

III. NONLINEAR OPTIMIZATION FORMULATION

In this section, we prove the equivalence between power grid design and deep learning. The objective of the power grid design is to obtain the optimum width of the power grid lines considering different reliability constraints. If the IR drop across the power grid lines () is represented as , where , is sheet resistance, width,

which is nonlinear function with variables and (considering to be constant). It is also well-known from [12] that training of a deep neural network is also a nonlinear optimization problem. Therefore, both the power grid design and training of a deep neural network are similar. Using this comparison, we build the neural network model for the power grid design problem which is shown in Fig. 3 Then the

Fig. 3. Equivalence between deep neural network training and power grid design (a) Training a neural network for weights (b) Solving power grid design with neural networks for weights

minimization of the power grid design objective function can be represented as follows,

where is the each instance of the power grid interconnect, WD() is the cost function of (1) as predicted by the neural networks for weights is the error function or loss function to evaluate the error form the true value. C() is the reliability and other constraints of the power grid design which are described below, and can be satisfied using the weight .

The relation between width of the power grid lines () and the spacing between the two power grid lines () can be represented as follows

where represents the ring width. For large number of power grid lines, designing power grids with such constraints mentioned in (1) and (3) become difficult and tedious process. The EM reliability constraint for maximum current density can be defined as,

These constraints need to be satisfied while designing the power grid using neural network, which are denoted as C() in (2), can be adjusted with weight .

IV. PROPOSED POWERPLANNINGDL FRAMEWORK

A. Problem Formulation

A floorplan of an SoC with the power grid lines and underlying functional blocks is shown in Fig. 4(a). While designing the power grid, it is very challenging to predict the optimum widths of the power grid lines. Overdesigning the power grid lines by increasing the power grid line widths increase the total metal routing area of the chip. If it is under-design in order to reduce the metal routing area, then the power grid suffers from unwanted IR drop and Electromigration effects due to the increase in resistance and current density of the metal lines. Simultaneously, the design rules need to be taken care of while overdesigning/under-designing. The correct predictions of the widths of the power grid lines can reduce different iterations of the power planning phase. Therefore, in our deep learning adaptation, we use a supervised learning approach to create a model. Our model learns the optimum widths of the metal lines from previous historical data which are obtained for IR drop and Electromigration resistant power grid designs with some allowed margin. Subsequently, we use this learned deep learning model in order to predict the widths of power grid lines for a new design.

As shown in (1), is dependent on and , which can only be found after power grid analysis. As power grid analysis is time-consuming, we want to evade the power grid analysis phase. Therefore, we are using alternate approach to predict . We are using X-coordinate, Y-coordinate (of the planned floorplan of the underlying functional blocks), and its switching current activity () (which is obtained from the from front-end phase in value change dump (VCD) file) to predict . The reason for choosing these as features are shown in Section IV-B. Considering this we have formulated two problems to be solved given as follows,

Problem 1. Given an X-coordinate, Y-coordinate of floorplan and the switching activity of the current for that point, then predict the metal width required for that location which can satisfy the IR drop and EM constraints.

Problem 2. Given the width and the switching activity of the PG interconnects, predict the IR drop of the PG interconnect.

Fig. 4. (a) A floorplan of an SoC with the power grid lines over the functional blocks. (b) Variation of scores for 1000 power grid interconnects of ibmpg1 benchmark circuit with different input features

We are using a multi-target regression technique to model the deep learning model where we consider multiple input features (independent variables, ) as the input to our model and numerous output features (dependent variables, ). Mathematically, it can be represented as

where for all power grid interconnects is the training dataset.

B. Feature Selection & Training Data Preparation Definition 1. (r2 score) or coefficient of determination is a metric which shows the goodness of the prediction for the

regression method. A value closer to() 1 is desired for the data to fit in the model properly.

For selecting various features for our deep learning model, we evaluated the score of different input features with the . It has been observed that the combination of the input features X-coordinate, Y-coordinate (of the planned floorplan of the underlying functional blocks), and its switching current activity () fits to be the best for the neural network-based multi-regression technique as it has higher score(Please refer, Fig. 4(b) and Table I).

TABLE I SCORE OF DIFFERENT INPUT FEATURES AND OUTPUT FEATURE A PG INTERCONNECT.

is the current obtained from the switching activity of the functional blocks having (X,Y) coordinate. Therefore, the training dataset is generated with the quadruple (X coordinate, Y coordinate, ) from some of the real power grid desings.

C. Neural Network-based Deep Learning Model

The neural network has one input, one output, and hidden layers. An illustrative example is shown in Fig. 5. There can be many number of hidden layers. We have used 10 hidden layers in our model, which is obtained by hyperparameter optimization. This neural network is trained with quadruple (X coordinate, Y coordinate, ) for different weights as part of its forward propagation step as mentioned in Section III. Subsequently, adam optimizer [13] is used to minize the loss or error function as a part in the backpropagation step. Once trained, the new test samples can be used to predict .

Fig. 5. Neural Network with one input, one output, and one hidden layer.

1) The Power Grid Interconnect Width Prediction: The

power grid interconnect width prediction is given below in Algorithm 1.

2) IR Drop Prediction: The IR drop prediction algorithm is given below in Algorithm 2. From Algorithm 1 after testing

on test dataset, we already have the , which means we have the of the power grid interconnect (considering to be constant). We need the to find the IR drop across the interconnect. The following approach helps in obtaining . The number of power grid lines which are required can be obtained using the following formula:

As shown in the Fig. 4(a), if we consider that power grid line carry, current. Then the current requirement of each of power grid lines to the blocks can be represented as follows,

where represents current provided by power grid line to the block. From the above, we can obtain current through the interconnect and subsequently the IR drop.

D. Test Data Generation

Test dataset is generated by perturbing the same dateset which are used for training. The perturbation is done by changing the branch current, node voltage, and switching current of the underlying functional blocks by a , which is termed as perturbation size. Experiments are done in the next section by varying the perturbation size in order to see the variation in prediction accuracy.

V. EXPERIMENTAL RESULTS

A. Simulation Setup

The framework is developed with C++ and python. For deep learning operations Tensorflow library of the python has been used on a Linux machine with Intel Xeon E5-2650 processor, with the GPU configuration Nvidia Tesla K20c. The datasets are generated, and the proposed PowerPlanningDL is validated using the IBM Power Grid benchmarks [14], which are standard power grid benchmarks extracted from IBM processors. The details of the IBM PG benchmarks are listed in Table II. Current loads of the IBM PG benchmarks are modified in order to obtain the desired effects. The simulation setup for the experiments is set according to Fig. 6. All the hyperparameters of the neural network are fixed for which the best results are obtained.

Fig. 6. Flow of the simulation setup of the Deep Learning Flow

TABLE II IBM PG BENCHMARK DETAILS [14]

B. Study of Predicted Power Grid Interconnect Width

In this section, the correlation between the predicted width of the power grid using PowerPlanningDL and conventional approach evaluated. From the correlation value, it can be seen how much the predicted widths are related to the golden width obtained from the conventional approach. The correlation plot is shown in Fig. 7(a). To study the error distribution of the predicted widths, the error histogram plot is shown in Fig. 7(b) (Horizontal axis represent error). From the error histogram, we can observe that most of the predicted widths are concentrated near 0, meaning most of the predicted widths of PG interconnect produce near about 0 error. As the amount of error increases, the number of power grid instances decreases. From this result, we can conclude that the predicted widths of the power grid lines using PowerPlanningDL are very close to the golden results generated by the conventional approach for most of the interconnects.

Fig. 7. Power Grid interconnect width prediction for ibmpg2 benchmark circuit (a) Correlation scatter plot (b) Error histogram (Horizontal axis represent the error).

C. Study of Predicted IR Drop in Power Grid

The IR drop map is plotted for the conventional approach and also for the PowerPlanningDL approach, as shown in Fig. 8 for ibmpg2 circuit and ibmpg6 circuit. The worst-case IR drop for all the benchmarks are listed in Table III. From the IR drop map and the worst-case IR drop values, it can be inferred that the PowerPlanningDL can predict the IR drop close to the conventional approach.

Fig. 8. IR drop map of (a) Conventional method ibmpg2 circuit (b) PowerPlanningDL methodology ibmpg2 circuit (c) Conventional method ibmpg6 circuit, and (d) PowerPlanningDL methodology ibmpg6 circuit.

TABLE III COMPARISION OF WORST-CASE IR DROP USING CONVENTIONAL POWER PLANNING APPROACH AND POWERPLANNINGDL FRAMEWORK

D. Main Result: Study of Convergence Time

The convergence time for both the approach is shown in Table IV. Convergence time of the conventional approach includes the IR drop analysis time, as it is the primary time-consuming task. For the PowerPlanningDL, the convergence time shows the prediction time of the width and IR drop prediction time, as mentioned in Section IV. From the table, it can be seen that our proposed PowerPlanningDL is 5.87faster than the conventional approach for the ibmpg5 benchmark. It is also observed that for larger benchmarks the speedup is more, as larger grids take more time for power grid analysis in conventional approach, which is not used in our PowerPlanningDL framework. That is one of the main reason that we get a significant speedup for our PowerPlanningDL compared to the conventional approach. We achieve the speedup at the cost of accuracy. It is to be noted that for the convergence time of the conventional approach reported in Table IV, we have considered the best-case scenario and reported the convergence time only for one iteration of the design cycle. In the worst case, there can be multiple iterations of the design cycle, for which the conventional approach takes much more time, whereas the convergence time will be the same for PowerPlanningDL in all scenarios. This also shows the advantage of PowerPlanningDL in reducing the number of iterations in the design cycle.

TABLE IV COMPARISON OF CONVERGENCE TIME FOR CONVENTIONAL POWER PLANNING APPROACH AND POWERPLANNINGDL FRAMEWORK

E. Overhead: Study of Model Accuracy

The mean square error (MSE) can be defined as,

The score, and MSE using the proposed framework is listed in Table V. MSE tells about the prediction error (overhead of deep learning approach) while predicting the interconnect width. From this result of MSE, we can conclude that the proposed PowerPlanningDL can predict the power grid design, which is very close to the golden design generated by the conventional approach. From score we know how well the data is fit in the model.

TABLE V EAK MEMORY USING POWERPLANNINGDL FRAMEWORK FOR ALL THE IBM PG BENCHMARKS

F. Study of Variation of MSE with Perturbation Size

The variation of MSE with the perturbation size () is shown in Fig. 9. It is observed that as the perturbation size increases the MSE increases. From this observation, we can infer that the proposed PowerPlanningDL is best suited for the incremental-based power grid design, where we need to generate the power grid for little changes (or perturbations) in the design.

(b) Fig. 9. Comparison of prediction accuracy on test set in MSE with variations in perturbations size for (a) ibmpg2 (b) ibmpg6 benchmark circuit.

G. Study of Peak Memory

For the completeness of the results, we have also evaluated the memory profile of the proposed framework using the mprof tool. The memory profile of the proposed framework for two benchmark circuits ibmpg2 and ibmpg6 are shown in Figure 10. We also show the peak memory usage for all the IBM PG benchmarks as listed in Table V.

Fig. 10. Memory used by PowerPlanningDL for (a) ibmpg2 benchmark circuit and (b) ibmpg6 benchmark circuit. 1 Gigabyte (GB) = 953.674 Mebibyte (MiB)

VI. CONCLUSION AND FUTURE WORK

In this paper, we have proposed a deep learning-based framework PowerPlanningDL to predict the initial power grid design. For the first time, we have shown the equivalence between the neural network training and power grid design. We predict the power grid interconnect width as part of the design process, which is time-consuming and tedious work. Subsequently, we also anticipate the worst-case IR drop in the power grid. A neural network-based multi-regression technique is used in our model for accomplishing the prediction tasks. Results on IBM power grid benchmarks show speedup than the conventional power grid design approach. We have also performed various other experiments.

From the results of the experiments, we can recommend the following for the adaptation of the deep learning in power planning phase of VLSI Physical Design:

• The predictability of the deep learning approach is close to the conventional method, with very less convergence time (speedup).

• Deep learning in power planning is useful in the incremental-based power grid designs, where the perturbation size is small.

• The error due to the prediction increases for the PowerPlanningDL framework for the designs with large perturbations.

• Finally, from this work, we can say that the industry can adapt the deep learning approach for the power grid design, which will reduce many iterative steps in order to obtain an appropriate initial design. Further, a better learning approach can be introduced for the efficient power grid design. Additionally, decap placementaware power grid design using deep learning technique can also be explored.

REFERENCES

[1] M. Fawaz and F. N. Najm, “Accurate verification of rc power grids,” in Proceedings of the 2016 Conference on Design, Automation & Test in Europe (DATE). EDA Consortium, 2016, pp. 814–817.

[2] X. Wang, H. Wang, J. He, S. X.-D. Tan, Y. Cai, and S. Yang, “Physics- based electromigration modeling and assessment for multi-segment interconnects in power grid networks,” in 2017 Proceedings of the Conference on Design, Automation & Test in Europe (DATE). European Design and Automation Association, 2017, pp. 1731–1736.

[3] S. Dey, S. Dash, S. Nandi, and G. Trivedi, “PGIREM: Reliability- constrained IR drop minimization and electromigration assessment of VLSI power grid networks using cooperative coevolution,” in 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 2018, pp. 40–45.

[4] S. ik Heo, A. B. Kahng, M. Kim, L. Wang, and C. Yang, “Detailed placement for ir drop mitigation by power staple insertion in sub-10nm vlsi,” in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2019, pp. 830–835.

[5] G. Cui, W. Yu, X. Li, Z. Zeng, and B. Gu, “Machine-learning-driven matrix ordering for power grid analysis,” in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2019, pp. 984–987.

[6] Y.-C. Fang, H.-Y. Lin, M.-Y. Sui, C.-M. Li, and E. J.-W. Fang, “Machine- learning-based dynamic ir drop prediction for eco,” in 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2018, pp. 1–7.

[7] Y.-C. Liu, C.-Y. Han, S.-Y. Lin, and J. C.-M. Li, “Psn-aware circuit test timing prediction using machine learning,” IET Computers & Digital Techniques, vol. 11, no. 2, pp. 60–67, 2016.

[8] W.-H. Chang, C.-H. Lin, S.-P. Mu, L.-D. Chen, C.-H. Tsai, Y.-C. Chiu, and M. C.-T. Chao, “Generating routing-driven power distribution networks with machine-learning technique,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 36, no. 8, pp. 1237–1250, 2017.

[9] S.-Y. Lin, Y.-C. Fang, Y.-C. Li, Y.-C. Liu, T.-S. Yang, S.-C. Lin, C.-M. Li, and E. J.-W. Fang, “Ir drop prediction of eco-revised circuits using machine learning,” in 2018 IEEE 36th VLSI Test Symposium (VTS). IEEE, 2018, pp. 1–6.

[10] F. Ye, F. Firouzi, Y. Yang, K. Chakrabarty, and M. B. Tahoori, “On-chip voltage-droop prediction using support-vector machines,” in 2014 IEEE 32nd VLSI Test Symposium (VTS). IEEE, 2014, pp. 1–6.

[11] Y. Cao, A. B. Kahng, J. Li, A. Roy, V. Srinivas, and B. Xu, “Learning- based prediction of package power delivery network quality,” in Proceedings of the 24th Asia and South Pacific Design Automation Conference. ACM, 2019, pp. 160–166.

[12] A. Ng, “Cs229 lecture notes, deep learning.” [Online]. Available: http://cs229.stanford.edu/notes/cs229-notes-deep learning.pdf

[13] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

[14] S. R. Nassif, “Power grid analysis benchmarks,” in Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific. IEEE, 2008, pp. 376–381.

Designed for Accessibility and to further Open Science