APPLYING THE SMALL-WORLD NETWORK TO ... - Xun ZHANG

experimentally by Stanley Milgram (1967) in the field of so- cial psychology[14]. After that, it has .... EXPERIMENTAL RESULTS AND. CONSIDERATIONS. 4.1.
118KB taille 5 téléchargements 215 vues
APPLYING THE SMALL-WORLD NETWORK TO ROUTING STRUCTURE OF FPGAS Hisashi TSUKIASHI2 , Masahiro IIDA1,3 and Toshinori SUEYOSHI1 1

Faculty of Engineering, Kumamoto University 2-39-1 Kurokami, Kumamoto, 860-8555, Japan 2 Graduate School of Science and Technology, Kumamoto University 3 PRESTO, Japan Science and Technology Agency 4-1-8 Honcho Kawaguchi, Saitama, Japan email: [email protected], {iida, sueyoshi}@cs.kumamoto-u.ac.jp speedup cannot be expected if wiring delay within the deep sub-micron process is not considered. At present, the semiconductor industry takes measures to meet the interconnectcrisis, which centers around the study of process technology, for instance, by using lower resistivity material wire and low-k dielectric capacitance material[3][4]. It is not enough only to regard programmable logic devices (though this research field is very important). Careful consideration also needs to be given to the problems with the routing structure. This is a preliminary study focusing on the connectivity in the programmable logic, and we propose a novel routing structure. The idea of the Small-World Network[5] will be applied to the routing structure in order to reduce wiring delay. The rest of the paper is organised as follows; Section 2 introduces the issues with the routing structure of the deep sub-micron processes and a novel routing structure is proposed. Section 3 describes and evaluates this routing structure. Section 4 gives a discussion of the results. Finally, the conclusions are presented in Section 5.

ABSTRACT The degree of integration and the operating frequency of programmable logic have improved dramatically with the development of new process technologies. However, for the deep sub-micron processes, the delay, reliability, cost, and power tend to be determined by interconnections. In conventional programmable logic, reducing the number of switches on a critical path is important because the wiring delay is considerably smaller than the switch delay. However, to achieve a decrease in the critical path delay it is also necessary to consider the wiring delay for the deep sub-micron processes. This paper proposes a novel routing structure using a Small-World Network structure for the interconnection of programmable logic. This paper demonstrates that the critical path delay can be reduced. Based on the results of an evaluation, the authors show that the critical path delay can be reduced by a maximum of 15% and the amount of routing resources can be reduced by a maximum of 23% when using the Small-World Network structure. 1. INTRODUCTION

2. PROPOSAL FOR A ROUTING STRUCTURE APPLYING THE SMALL-WORLD NETWORK

The logic density and operating frequency of VLSI has, so far, improved dramatically with the development of new process technology. In the case of classical transistor scaling, device performance improves as gate length and gate dielectric thicknesses are scaled. In contrast, the wiring delay increases due to increased resistance and current density with the scaling down of process technology. It is obvious that for this scaling approach to be effective in improving RC delay, a metal with lower resistivity and higher reliability is needed. For example, copper metallization is a promising technique for the manufacture of lower resistivity wire. Therefore, in the short term, the interconnection latency starts to dominate chip performance instead of gate latency. This is called interconnection-crisis, which is a major problem for deep sub-micron processes[1][2]. In conventional programmable logic, reducing the number of switches on a signal path is important because wiring delay is considerably smaller than switch delay. Increased

0-7803-9362-7/05/$20.00 ©2005 IEEE

2.1. Issues with the Routing Structure The typical routing structure of a programmable logic device is the “island-style” structure[6], as shown Fig.1. It is found in the XC4000 series[7] and the Virtex series[8] of Xilinx, Inc. The logic blocks connect to a switch block network (omitted in this figure). The switch blocks connect to the horizontal and vertical channels. The channel generally has few kinds of wire segment, consisting of a single length wire, double length wire, quad length wire and long line in this instance. Combination of a segment length is important by routing structure of this type. The study on mixed wire length in [9] shows that length 4 and length 8 wires give the best tradeoff between speed and area. On the other hand, Altera’s devices[10][11] have a hierarchical routing resource. The lowest level of routing hierar-

65

Switch Block

Table 1. Interconnect technology requirements[13]. Process 1mm line Gate technology (nm) 1 delay (ps) 2 delay (ps) 3 130 21 2.55 90 37 1.84 65 79 1.14 45 131 0.85 32 248 0.56 22 452 0.35

Channel Single Double Quad

Logic Block

Long

Track

Fig. 1. General routing structure.

Regular

SRAM

Small World

Random

Pass-transistor

LUT

LUT RC delay

Switching delay

Total wiring delay of signal path

p=0

Fig. 2. Delay model of a signal path.

Increasing randomness

p=1

Fig. 3. Small-World graph.

chy is in a logic element (LE). The first level is connectivity between the local routing wires within a logic array block (LAB). The second level is interconnectivity between LAB and I/O. The hierarchical segmentation scheme might generally reduce the number of switch-box switches. Another major study of routing architecture is the Meshof-Trees (MoT) network[12]. MoT topology can achieve better scalability than a flat, Manhattan topology. Given sufficient wiring layers, the MoT network layout can maintain a constant area per logic block as the design scales up. Moreover, the number of switches in any path in the MoT needs to only grow as O(log(N )). As shown above, in previous studies of routing structures it was expected that an improvement of wiring delay could be achieved mainly reducing the number of switches on a signal path, the delay model for which is shown Fig. 2. As for conventional process technology, on-resistance of pass transistor is considerably bigger than wire resistance. Therefore, it is effective enough in reducing the number of switches. However, a chief concern of the deep sub-micron process is the increasing RC delay of global wires. Global wires present a more serious problem to designers[1]. Table 1 shows the requirement of 1mm line RC delay at minimum pitch from the ITRS2002 Update[13]. Since local and intermediate interconnections tend to scale in length, latency is dominated by global wires. Accordingly, the optimization of wire length is also important. 1 DRAM

1/2 Pitch. (nm) RC delay 1 mm line at minimum pitch. (ps) 3 LOP (Low Operating Power) NMOS Device τ ( Cgate×V dd/Id − N M OS ). (ps) 2 Interconnect

66

2.2. Small-World Graph Almost everyone has experienced engaging in conversation with a complete stranger and discovering unexpectedly that they both know somebody in common. Invariably, they say “It’s a small world!”. The small-world phenomenon, popularly known as six degrees of separation, has been studied experimentally by Stanley Milgram (1967) in the field of social psychology[14]. After that, it has been mathematically formalized by Duncan Watts et al. (1998) in a study of the topological properties network[5]. The proposed definition of the small-world phenomenon is based on two different quantities: • L (Characteristic path length) : The characteristic path length (L) defined as the average number of edges in the shortest path between two vertices, averaged over all pairs of vertices, • C (Clustering coefficient) : The clustering coefficient (C) defined as follows. Suppose that a vertex v has kv neighbors; then at most kv (kv − 1)/2 edges can exist between them. Let Cv denote the fraction of these allowable edges that actually exist. Define C as the average of Cv over all v. Fig. 3 illustrates an example of a small-world graph. Starting from a ring graph with n vertices and k edges, we re-wire each edge at random with probability p. These graphs shows the steps taken to get from regularity and disorder by changing p. For p = 0, the original ring graph is unchanged; as p increases, the graph becomes increasingly disordered until for p = 1, all edges are re-wired randomly. For the intermediate value of p, the graph is a small-world network

(or SWN for short) : highly clustered (Cregular ≈ C  Crandom ) like a regular graph , yet with a small characteristic path length (Lregular  L ≈ Lrandom ), like a random graph. These small-world networks result from the introduction of a few long-range edges known as ”shortcuts” so as to connect between farther vertices.

Switch Block Logic Block

Y length

Y

X

2.3. Adopting the SWN in Routing Structure We focused on decreasing L by a few shortcuts, and proposed a routing structure applying SWN for FPGA. In the routing structure of FPGA it is possible to regard the switch block and wire as a vertex and edge. The authors applied the small-world network that is not depending on the original routing structure by adopting a mesh routing structure and the segmentation scheme[6]. The device size is N ×N logic blocks. Each channel has wires of several different lengths: Single-length lines, Double-length lines, Quad-length lines, and Long lines, within w wires. The SWNize procedure involves the following:

Small-World Line (SW Line)

X length

Fig. 4. Example of routing structure applying SWN. 3. EVALUATION OF THE PROPOSAL ROUTING STRUCTURE 3.1. Evaluation Environment This section presents a brief description of an evaluation environment and evaluation model. Firstly, we modified VPR(Versatile Placement and Routing) ver4.30[17][18] to address our SWN routing structure. VPR is a placement and routing tool for array-based FPGAs, which was developed at the University of Toronto. Here after we refer to the modified VPR as “VPR for SWN”. More specifically, as mentioned before, the VPR for SWN chooses two switch blocks at random. The VPR for SWN connects these switch blocks to a wiring resource of both the sides as an initial and a terminal of an SW line. Secondly, the effectiveness of the SWN routing structure was evaluated using the 20 circuits of MCNC benchmark[19]. Table 2 shows the circuit names, the number of logic blocks (LBs) and the number of IOs we used for the evaluation. “-pres fac mult 1.3” and “-max router iterations 200” flags of VPR are used for a high level of routing effort.

(1) Choosing two arbitrary switch blocks at random, (2) Using a direct line (called a SW line) to append the routing structure between the chosen switch blocks, (3) Repeating the above operations until SW line length reaches p × (T otal regular lines length). The reason that we changed from ”re-wiring” to ”appending” of the edge is because we did not want to change the original Regular routing structure. We redefine the rewiring probability p in order to adopt the SWN into the routing structure as follows; p(%) =

45

T otal SW Line length ×100. T otal Regular Line length

3.2. Evaluation Model

Moreover, this routing structure may show better performance when it combines “X architecture”[15][16]. We think a diagonal interconnect is suitable for the SW line. A SW line length is calculated as a routing length of diagonal 45 degrees instead of Manhattan distance. For example, in Fig. 4, the SWN line length √ is SW Line length = (2 2 + 2) × Single Line length. Fig.4 depicts a SWN in which an example of routing structure is applied. To clarify the possible advantages of SWN, consider switch blocks X and Y . Using standard connecting methods switch block X could be connected to switch block Y by a combination of one quad line, one double line, and one single line. In contrast, based on SWN concepts, switch block X can be connected to switch block Y by one single line and one SW line. As a result, the SWN method can reduce not only the wiring delay but also the switch delay.

Table 3 describes the device models using this evaluation. The small circuits are implemented by Device Model A, but Device Model B and C are used for the big circuits which can not be implemented by Device Model A. The logic block does not have plural LUTs, because we excluded the influence of clustering of LUTs to compare accurate routing structure. Table 4 shows the combination of wires and the number of each length of line. This segment combination is almost the same as the Xilinx XC4000 series, but the number of long lines is different, and because of the number of long lines differs depending on the type of product and the channel direction. Moreover, our routing structure uses SW Line instead of Long Line. On the other hand, our target is a routing structure of deep sub-micron devices. To do so, reliable parameter values are required to estimate the critical path delay of the benchmark circuits using the VPR for SWN. The parameter

67

Table 4. Wire length mix. (every device model uses this raito) Segment length Single Double Quad Long (%) 25 12.5 37.5 25

Table 2. MCNC Benchmark Circuits. Circuit Name # of LBs # of IOs Device Model alu4 1,522 22 A apex2 1,878 41 A apex4 1,262 28 A bigkey 1,707 426 C clma 8,383 144 B des 1,591 501 C diffeq 1,497 103 A dsip 1,370 426 C elliptic 3,604 245 A ex1010 4,598 20 B ex5p 1,064 71 A frisc 3,556 136 A misex3 1,397 28 A pdc 4,575 56 B s298 1,931 10 A s38417 6,406 135 B s38584.1 6,447 342 B seq 1,750 76 A spla 3,690 62 A tseng 1,047 174 A Table 3. Device Models. Device Model A B Device size (LBs) 64×64 96×96 # of 4-LUTs/LB 1 1 # of Tracks 32 40

in all process technologies. As a consequence, SW lines are more effective for delay reduction than Long lines. The structure that we proposed may therefore be highly effective in future size applications. However, the critical path delay of some circuits were not reduced. It is considered that causes were not an influence because there is a greater number of critical paths or the SWLs were used for non critical paths.

4.2. Routing Resources We investigated the impact on routing resources of changing from a traditional structure to our proposed structure. For example, in the Device Model A of p = 1%, we calculated the total length for each wire segments using the following formula; T otal Single line length = 64 × 8 × 65 × 2 × 1 = 66, 560 [LSL ],

C 128×128 1 64

T otal Double line length = 32 × 4 × 65 × 2 × 2 = 33, 280 [LSL ], T otal Quad line length = 16 × 12 × 65 × 2 × 4 = 99, 840 [LSL ], T otal Long line length = 1 × 8 × 65 × 2 × 64 = 66, 560 [LSL ],

of our evaluations is derived from the BPTM[20] 350nm, 180nm and 70nm technologies, which has been predicted by UC Berkeley and used widely as a parameter for future generation devices. Moreover, on evaluation, we defined the probability p to be 0.1, 0.5, 1.0, 2.0 and 5.0(%).

T otal SW line length = NSW L × ASW L = 1, 095 × 36 = 39, 420 [LSL ]. LSL : NSW L : ASW L :

4. EXPERIMENTAL RESULTS AND CONSIDERATIONS 4.1. Critical Path Delay : SW Line vs. Long Line

Length of Single line # of SW line Avg. length of SW line

!

In these results, Total line length of the Regular routing structure and the SWN routing structure are followed;

Fig.5 depicts the reduction ratios of critical path delay for each value of p and each type of process technology. The X-axis shows the names of the benchmark circuits. The Yaxis shows the reduction ratio of the critical path delay. This graph compares our SWN routing structure without Long lines to a Regular routing structure with Long lines. In (a) 350nm process technology, the maximum reduction in delay was 2.9%. The figure shows that more often than not the critical path delay of benchmark circuits decreases. Compared with the previous results, for (b) 180nm process technology some circuits showed an increase in delay. The maximum reduction in delay was 11.0%. In the result for (c) 70nm process technology, the maximum reduction in delay was 15.3%. It was the best reduction raito

T otal line length of Regular structure : 266, 240 [LSL ], T otal line length of SW N structure : 239, 100 [LSL ].

Hence the SWN routing structure offers an approximate 10.2% reduction in routing resources compared with the Regular routing structure. Table 5 shows reduction raito of routing rosources of the SWN routing structure compared with the Regular routing structure for each Device Model and each value of p. It is natural that lower value of p makes more reduction raito of routing resources because of fewer SW lines. In Fig.5, slight difference of the critical path delay is shown for each value of p. As a result, lower value of p can be reduced both the critical path delay and routing resources.

68

(a) 350nm CMOS Process

a

ng tse

sp l sp l sp l

ng

se q se q

tse

s3 84 17 s3 85 84 .1 s3 84 17 s3 85 84 .1

a

s2 98 s2 98

(c) 70nm CMOS Process

ng tse

a

x3 ise m

fri sc

ex 5p

ex 10 10

tic el lip

ds ip

q ffe di

de s

a cl m

gk ey

p=0.5% p=2%

bi

ap ex

2 ap ex

4

p=0.1% p=1% p=5%

al u4

Reduction raito of critical path delay

se q

pd c pd c

m

ise

x3

fri sc

ex 5p

ex 10 10

tic el lip

ds ip

q ffe di

de s

a cl m

gk ey bi

ap ex

4

2 ap ex

p=0.5% p=2%

(b) 180nm CMOS Process 16% 14% 12% 10% 8% 6% 4% 2% 0% -2% -4% -6% -8% -10% -12% -14%

s3 84 17 s3 85 84 .1

s2 98

pd c

m

ise

x3

fri sc

ex 5p

tic ex 10 10

el lip

ds ip

q ffe di

de s

a cl m

bi

gk ey

4

2

ap ex

ap ex

al u4

p=0.5% p=2%

p=0.1% p=1% p=5%

16% 14% 12% 10% 8% 6% 4% 2% 0% -2% -4% -6% -8% -10% -12% -14%

al u4

Reduction raito of critical path delay Reduction raito of critical path delay

p=0.1% p=1% p=5%

16% 14% 12% 10% 8% 6% 4% 2% 0% -2% -4% -6% -8% -10% -12% -14%

Fig. 5. Reduction ratios of critical path delay for each p and each process technology. The usage ratio both of Long lines and SW lines decreases as the process is miniaturised. However, in the result of Fig.5, the higher delay reduction raito is confirmed because SW lines are used on the critical path. Therefore, the SWN routing structure has an effectiveness for deep submicron processes.

Table 5. Reduction raito of routing resources. Device Model A B C p=0.1% 23.5% 23.0% 23.0% p=0.5% 17.6% 15.0% 11.7% p=1% 10.2% 5.4% -1.5% p=2% -4.5% -15.1% -28.1% p=5% -48.8% -75.2% -107.7%

5. CONCLUSIONS AND FUTURE WORKS 5.1. Conclusions

4.3. Usage Ratio of Routing Wire Segments

This paper has proposed a novel routing structure in which the SWN is applied the deep sub-micron process. The authors have demonstrated the possibility of this structure to reduce the total delay on a signal path. Based on the results of an evaluation the authors have shown that the critical path delay can be reduced by a maximum of 15% and the amount of wire resources can be re-

Fig. 6 shows a comparison for each of the routing wire segments. The graph on the top shows the results for the Regular routing structure. The graph on the bottom shows the results for the SWN routing structure (p = 1%). In both graphs, the X-axis represents the wire segment length, and the Y-axis the wire usage ratio.

69

Traditional routing structure

6. REFERENCES

35%

Usage raito (%)

30% 25%

350nm 180nm 70nm

[1] R. Ho, K. W. Mai, and M. A. Horowitz, “The future of wires,” in Proc. IEEE, vol. 89, no. 4, Apr. 2001, pp. 490–504. [2] T. Sakurai, “Issues of current LSI technology and an expectation for new system-level integration,” in Proc. International Symposium on Advanced CMOS Devices, Oct. 2001, pp. 17– 22. [3] “Asuka Project.” [Online]. Available: http://www.selete.co. jp/SeleteHPJ1/e html/asuka/outline.html [4] “Mirai Project.” [Online]. Available: http://www.miraipj.jp/ en/ [5] D. J. Watts, Small Worlds: The Dynamics of Networks between Order and Randomness. Princeton, 1999. [6] S. D. Brown, R. J. Francis, J. Rose, and Z. G. Vranesic, FieldProgrammable Gate Arrays. Kluwer Academic Publishers, 1992. [7] The Programmable Logic Data Book 1998, Xilinx, Inc., 1998. [8] Virtex-II Platform FPGA Handbook, Xilinx, Inc., 2000. [9] V. Betz and J. Rose, “FPGA routing architecture: Segmentation and buffering to optimize speed and density,” in Proc. ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays, Feb. 1999, pp. 59–68. [10] APEX20K Programmable Logic Device Family Data Sheet, Altera Corporation, 1999. [11] D. Lewis, V. Betz, D. Jefferson, C. Lane, P. Leventis, S. Marquardt, C. McClintock, B. Pedersen, G. Powell, S. Reddy, C. Wysocki, B. Pedersen, , and J. Rose, “The Stratix routing and logic architecture,” in Proc. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2003, pp. 12–20. [12] A. DeHon, “Design of FPGA interconnect for multilevel metalization,” in Proc. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2003, pp. 154–163. [13] “International Technology Roadmap for Semiconductors 2002 Update,” International Semantech Inc., 2002. [14] S. Milgram, “The small-world problem,” Psychology Today, vol. 2, pp. 60–67, May 1967. [15] M. I. et al, “A diagonal interconnect architecture and its application to RISC core design,” Digest of Papers of ISSCC, pp. 210–211, 2003. [16] “X Initiative.” [Online]. Available: http://www.xinitiative. org/ [17] V. Betz and J. Rose, “VPR: A new packing, placement and routing tool for FPGA research,” in Proc. International Workshop on Field Programmable Logic and Applications, Sept. 1997, pp. 213–222. [18] V. Betz, VPR and T-VPack User’s Manual (Version 4.30), 2000. [19] S. Yang, “Logic synthesis and optimization benchmarks, version 3.0,” Microelectoronics Centre of North Carolina, Tech. Rep., 1991. [20] “Berkeley Predictive Technology Model (BPTM).” [Online]. Available: http://www-device.eecs.berkeley.edu/∼ptm/ interconnect.html

20% 15% 10% 5% 0% Single

Double

Quad

Long

Double

Quad

SW

Wire segments SWN routing structure (p=1%)

35%

Usage raito (%)

30% 25%

350nm 180nm 70nm

20% 15% 10% 5% 0% Single

Wire segments

Fig. 6. Usage ratio of routing wires.

duced by a maximum of 23%. The more process technology develops, the more the reduction ratio of critical path delay increases. Routing structures in which SWN is applied have the potential of achieving reduced wire delay. Consequently, the SWN structure is a suitable routing structure for deep sub-micron FPGA.

5.2. Future Works In the evaluation of critical path delay, There are some circuits that are not reduced by SW routing structure. We must continue to analize this reason, and research the proposed method. The current VPR for SWN does not consider to placing circuits where the SW line is in the device. In future work, we will develop the placement and routing algorithm in consideration of places of SW lines. On the other hand, benchmark circuits are very important. We plan to evaluate our routing structure using practical application circuits. Moreover, the strategy using the SWN does not depend on the original routing structure, which means that there is the need to apply the SWN to several routing topologies. Finally, in order to realize the next generation programmable logic, unification of the architecture of the whole device, including the routing structure and the logical block structure, will be done in the future.

70