Paper formatting guidelines for FPL 2005 ... - Xun ZHANG page

system with a data driven architecture specification file, and primarily addresses .... combined via simulation into the efficiency PDF, which can then be overlaid ...
127KB taille 3 téléchargements 283 vues
COPING WITH UNCERTAINTY IN FPGA ARCHITECTURE DESIGN Boris Ratchev, Mike Hutton and David Mendel Altera Corporation 101 Innovation Dr. San Jose, CA, USA 95134 {bratchev,mhutton,dmendel}@altera.com

ABSTRACT The design of FPGA architectures involves optimization of area, delay, power and routability across hundreds of architectural choices (e.g. LUT size, wire length, flexibility and circuit sizing). Since the difficulty of defining and predicting the design space only grows as we approach 65nm and 45nm processes it is necessary to have a better understanding of uncertainty in the architecture definition. In this paper we look at the sources of uncertainty, describe current unpublished methods for encapsulating error and uncertainty in experiments, and propose new methodologies involving ad-hoc, analytic and Monte Carlo simulation techniques to manage these risks in the future.

switch 0 buffered: yes R: 1000.0 Cin: 1.0e-15 cout: 1.0e-15 segment frequency: 1.0 length: 1 wire_switch: 0 opin_switch: 0 Frac_cb: 1 Frac_sb: 1 rmetal: 100.0 Cmetal: 1.0e-14 # Process parameters R_minW_nmos 5000 …

1. INTRODUCTION In 2002, Yan, Cheng and Wilton [13] presented a very interesting study that challenged us to understand the effect of assumptions in the design of FPGA architectures. They showed that optimal values for LUT size, switch block topology, cluster size and memory architecture varied with the choice of synthesis tool, test designs and underlying architecture fabric. This effect can be expanded to include preliminary process and timing models, measurement error, and bias due to immature prototype tools in addition to tool assumptions. As digital design becomes further linked to physical, yield and timing effects for deep submicron the questions of uncertainty become even more pervasive and question the use of the point studies wherein a single variable can be analyzed while all others are kept constant. Uncertainty can be combinatorial (interaction of hundreds of architectural and design decisions), structural (e.g. bias due to existing benchmark designs), measurement (pre-layout estimates) or based on assumptions (tool flows and methodology as described in [13]). Failure to consider this uncertainty will lead to architecture decisions becoming increasingly unreliable. Our baseline architecture process involves a rigorous experimental method based on the VPR methodology proposed by Betz [1], [2] in 1997, illustrated in Figure 1. This process involves an empirical architecture generation system with a data driven architecture specification file, and primarily addresses combinatorial uncertainty by allowing

0-7803-9362-7/05/$20.00 ©2005 IEEE

Figure 1: FPGA Modeling Toolkit and Arch File Snippet

the designer to “sweep” across multiple parameter values. This methodology has been used academically, with extensions for power [6], and commercially for Altera Stratix [4] and Stratix II [3,5] (where the architecture definition is more than 300 pages of code). In this paper we describe some of the growing issues with uncertainty and some of the early methodologies we have used or propose to use in order to address it. 2. CURRENT TECHNIQUES AND LIMITATIONS There are some straightforward ways to account for uncertainty and bias in architectural experiments. For example, we can use ad-hoc techniques such as weighting recent or targeted designs over older benchmarks. Commercial devices remove uncertainty from the design process with multiple forms of guard banding – e.g. adding 5% more wire than is needed for the current design set to compensate for potentially difficult designs that are not currently available, or using Rent-like theory to understand the upper envelope of routability [7]. Though a solid experimental methodology is necessary, it is by no means sufficient to deal with uncertainty. We have used VPR/FMT to successfully manage the issues of combinatorial blowup, but no methodology alone can tell

662

independence and linearity by keeping the majority of parameters fixed and using averages as their approximation. A new design complexity at 65nm and particularly 45nm is on chip variation (OCV) due to dopant fluctuation and lithographic inaccuracy. Visweswariah [12], among others, has argued that the solution to variation is statistical modeling, particularly of delay and power, using PDFs rather than mean guard banded timing values for an analytic solution, or otherwise using Monte Carlo simulation. Such issues are very intriguing for FPGA architecture design and CAD, as OCV could change the standard methodology of comparing next generation architecture ideas on a “same process” basis to isolate pure architecture gains. Figure 2. Sensitivity Analysis: % area reduction versus packing density for different LE ideas.

3. STATISTICAL MODELING

you that the CAD flow needs to be changed. For example, the design of a fracturable LE [4] requires tech-mapping to balance 6-LUT and 5-LUT functions, something that had not been required previously. We can incorporate error estimation graphically. Figure 2 shows cost-efficiency plots for three different fracturable logic-element proposals (see next section for a discussion of efficiency) plotted against packing density (error-prone due to less-mature prototype tools) and for varying initial area estimates. The Y-axis is %area win. We could also express greater variance in our area estimates for more radically different proposals by showing error-bars on these curves. The benefit of biasing, sensitivity analysis and similar techniques is that they allow for the incorporation of quantifiable risk in the evaluation process rather than just the mean predicted benefit. While they improve our understanding they are still not enough to intrinsically incorporate variance.

One solution to architectural uncertainty is to incorporate variance analytically by adding PDFs to model parameter variance. Though this has clear theoretical merit, we have yet to find a direct application – unlike blockbased statistical timing analysis, which uses similar techniques, the practical issues currently appear to overwhelm the benefits. For example, measurement uncertainty from layout has a discrete PDF. A more promising solution to truly encapsulating many forms of uncertainty in the architectural flow is to use simulation methods such as Monte Carlo. This has been applied in a number of different domains including statistical timing analysis [11], and is available as an add-in for analysis packages. Simulation allows us to deal effectively with nonlinearity, and variance and is relatively easy to implement. In the remainder of this section we explore the use of simulation for architecture design using a realistic case study of a logic cell architecture decision. Consider the evaluation of a proposed new logic element (LE). Define density ratio for the new LE as the ratio of the number of base LEs to the number of new LEs after tech mapping (if a new LE has more expressive power, it is more dense). Define the area ratio of the new LE as sizeof(new)/sizeof(base), where size is an absolute silicon area. Then the efficiency (E) of the proposed LE can be stated as: E = density ratio / area ratio = D/A (1) Simply put, more efficient LEs are better since they absorb more logic and take up less silicon area. This study assumes the only factor we are concerned with is area. Figure 3 shows a proposed evaluation methodology. We compute the area PDFs for each LE using input from the circuit designer. Prototype CAD gives us a distribution of density across many designs. The two are sampled and combined via simulation into the efficiency PDF, which can then be overlaid and analyzed not just for mean expected value of the competing LE proposals, but also for uncertainty and risk.

2.1. Variance and the Flaw of Averages Failure to account for variance, statistical correlations and nonlinearity can result in incorrect conclusions [9]. For example, two 5% benefits from a synthesis algorithm rarely result in a 10% gain when used together. The sensitivity analysis shown in Figure 2 is one way to deal with variance. Nonlinearity is an even more subtle problem. E[f(x)] is not equal to f(E[x]) unless f(x) is a linear function. This effect has been coined the flaw of averages [9]. One example of a nonlinear function is the efficiency of a logic element (LE). A more powerful logic element can absorb more logic, but typically requires more silicon area. We then define LE “efficiency” as density/area. Since this function is nonlinear, we cannot plug in average values and expect to get the correct average for efficiency--more on this topic in Section 3.2. Yet “point studies” which dominate architecture exploration generally assume

663

Area PDF

Density PDF

Simulated Efficiency

Compare

LE 1

Overlay outputs: Proposal 2 Wins.

LE 2 Area PDF

Density PDF

Simulated Efficiency

Figure 4. Area-Ratio Distribution for LE1

Figure 3. Efficiency comparison using Monte Carlo Simulation

3.1.1. Area Ratio Estimate Rather than a single area, we use an area distribution for the new LE, based on our confidence in the area estimate. For example, “most likely 1.2, but no less than 1.15 and no more than 1.25”, would yield a triangular PDF peaking at 1.2. The distribution or expression of uncertainty can be as complicated as we like, e.g. the triangular distribution as explained above, a Normal distribution, a discrete distribution of any shape, or a Bayesian decision tree with arbitrary probabilities. Since the estimate of logic element area actually includes the routing area, we have an interesting phenomenon. As the logic element becomes denser it actually affects the amount of routing area required to support it. Thus the true metric of area is A = (ALE + AR). Since AR depends on the Density, we have a nonlinear function for E(D): (1a) E(D) = D / (ALE + AR(D)) where AR is a function of D. We know that if we plug average inputs into this nonlinear function we won’t get the correct average output (E). Figure 4 shows what the distribution for ALE + AR looks like for our simulation for LE1 (used the triangular distribution for ALE and the affect of AR as described above).

Figure 5. Re-sampled Density Distribution for LE1

from our prototype flow (historical data) in the simulation is resampling. Resampling allows replacing a number (most likely average number of LE’s used to map a “typical” design) by a distribution. So where does the synchronized part come? Sometimes when we draw a number from the historical set it is related to another number. So we in effect draw a pair. As an example, we model a feature such as the shared LUT mask (SLM) [3,5] that reduces 6 input LE’s by a certain percentage as they are packed together. Since SLM is not in our early prototype flow, we estimate its effect as a function of 6-LUTs produced. So, when we draw a density from the distribution, we use the corresponding number of 6-LUTs to modify the density. The density distribution for LE 1 with the assumptions described is shown in Figure 5.

3.1.2. Density Estimate We now turn to the estimate of density. This estimate requires CAD work. We need to map to the new LE in synthesis and form a reasonable probabilistic estimate of the expressive power of the new LE. We run this prototype flow on as many benchmark designs as we can thus producing a distribution of densities that we will be using as the other input to our simulation model. It is important that we don’t calculate an average density and plug it into equation (1a) to avoid the flaw of averages. To compensate for the fact that the synthesis flow does not implement all LE features at this early stage of the evaluation we use synchronized resampling [11]. This works as follows: We resample historical data instead of an idealized distribution in order to generate a distribution for a parameter. For example using the actual densities we get

3.1.3. Efficiency Estimate An error prone method for estimating the efficiency of an LE is to plug in the average values for Density and Area Ratios in equation 1. As explained above that approach gives the wrong answer because the function on the right hand side is nonlinear. Instead we simulate. Rather than making the two inputs to our model one average number we allow them to vary as shown by the histograms in figures 4 and 5. Simulation was performed using the general algorithm shown in Figure 6. At the end of this process we have a probabilistic histogram for efficiency. When one has the histograms for all candidate LEs we are in a position to put them side by side and intelligently decide if one LE is better than the other. Figure 7 shows those results. It is easy to see that LE 1 is better most of the time and on average—dark histogram

664

For n =1 to N (N large for convergence): { Pick a Density from the Density distribution as encoded by histogram Pick an Area Ratio from the second input histogram Divide the picked Density by the picked Area. This is an Efficiency data point. Put the Efficiency for this run in the appropriate bin of the output histogram }

Figure 6. Outline of the MC Simulation algorithm. Figure 7. Efficiency Overlay for LE1 (dark) and LE2

is to the right of lighter one. This information can be also shown using cumulative distributions. We estimated the two efficiencies using average values for area and density ratios. The answers were very different from the simulated averages. The average for LE1 was 7% and that for LE2 was 25% off the correct (simulated) value. For completeness we should note that simulation can be used for meaningful conclusions only if the simulation converges; in other words the average of the output should not vary significantly with each new trial. A convergence graph smoothing for 400 trials is shown in Figure 8. Finally we point out that sensitivity analysis is inherent in MC simulation-- it is a way for the architect to shake up their assumptions (in this case, area estimate, and density estimate) a bit and see if the conclusions still hold.

Figure 8. Efficiency Convergence Graph [2] V. Betz and J. Rose, “Automatic Generation of FPGA Architectures”, in Proc. FPGA, 2000, 175-186. [3] M. Hutton, et al, “Improving FPGA Performance and Area Using an Adaptive Logic Module”, in Proc. FPL 2004, 135-144. [4] D. Lewis, et.al, “The Stratix TM Routing and Logic Architecture”, in Proc. FPGA 2003, 12-20. [5] D. Lewis et.al. “The Stratix II Logic and Routing Architecture”, in Proc. FPGA 2005, 14-20. [6] F. Li, D. Chen, L. He and J. Cong, “Architecture Evaluation for Power-Efficient FPGAs”, in Proc. FPGA 2003, 175-184. [7] J. Pistorius and M. Hutton, “Placement Rent Exponent Calculation Methods, Temporal Behavior and FPGA Architecture Evaluation”, in Proc. SLIP 2003, 31-38. [8] J. Rose, R.J. Francis, D. Lewis and P. Chow, “Architecture of Field-Programmable Gate Arrays: The Effect of Logic Functionality on Area Efficiency”, IEEE J. Solid-State Circuits, 1990. [9] Sam Savage's Online Tutorial on understanding uncertainty: http://analycorp.com/uncertainty/ [10] J. Simon, “Resampling, the new statistics”, Resampling Stats Inc., 1974-1995. [11] A. Srivastava, D. Sylvester and D. Blaauw, Statistical Analysis and Optimization for VLSI: Timing and Power, Springer, 2005. [12] C. Visweswariah, “Death, Taxes and Failing Chips”, in Proc. 40th ACM/IEEE DAC. pp. 343-347, 2003. [13] A. Yan, R. Cheng and S. Wilton, “On the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools and Techniques”, in Proc. FPGA 2002, 147-156.

4. CONCLUSIONS AND FUTURE WORK Uncertainty is an inherent factor in all activities that involve predicting behavior of a system before the system has been built. This has always been a part of FPGA architecture design due simply to the combinatorial explosion of architectural parameters. However, the increasing design space for FPGA architecture and new issues such as physical effects indicate that the encapsulation of uncertainty in architecture development may allow for significantly better optimized devices. This paper outlined some of the many sources of uncertainty in FPGA design and proposed some methodologies for mitigating its effects. Techniques like design weighting, sensitivity analysis, and parameter encapsulation are already quite accessible tools that can be applied to decision making. We further feel that both Monte Carlo simulation and analytical stochastic models are tools which may become commonplace in the future. 5. REFERENCES [1] V. Betz and J. Rose, “VPR: A New Packing, Placement and Routing Tool for FPGA Research”, in Proc. FPL 1997, 213-222.

665