Paper formatting guidelines for FPL 2005 proceedings - Xun ZHANG

prototyping mixed synchronous/asynchronous circuits such as GALS systems on .... single flit packet or for burst mode: start-packet flit, body flit, end-of-packet flit. .... (this is true for both clocked and delay-insensitive arbiters/synchronizers).
151KB taille 1 téléchargements 286 vues
GALS SYSTEMS PROTOTYPING USING MULTICLOCK FPGAS AND ASYNCHRONOUS NETWORK-ON-CHIPS Jérôme Quartana*, Salim Renane, Arnaud Baixas, Laurent Fesquet, Marc Renaudin TIMA Laboratory, 46 av. Felix Viallet 38031 Grenoble Cedex, France [email protected] item can be lost or duplicated. Moreover, regular distributed network topologies (any topology based on point-to-point links, such as meshes, tores or crossbars), built of independent routing nodes, fully exploit modularity and locality design properties of asynchronous circuits. Several publications bring major research contributions to illustrate the benefits of using asynchronous NoCs for GALS systems. But these works were focused on the scope to successfully demonstrate the competitiveness of selftimed interconnect networks [4], [5] and to compare topology performances by using ad-hoc synchronous peripherals adapted to their asynchronous networks [6]. In the domain of asynchronous circuits’ fast prototyping, many publications describe new FPGA architectures devoted to asynchronous logic [7], few are focused on prototyping asynchronous circuits with standard synchronous FPGAs [8] and very few are dealing with prototyping mixed synchronous/asynchronous circuits such as GALS systems on standard FPGAs [9]. In [9], the [6]’s stoppable clock methodology, based on asynchronous wrappers around synchronous blocks, is used to implement a GALS Reed-Solomon decoder on a commercial FPGA, but without any interconnect network consideration. Moreover, such stoppable clock techniques need training sessions and suffer from PVT sensibility [10]. We put the contribution two steps forward: we give a network-centric methodology to build complex GALS systems by 1) prototyping asynchronous NoCs on multiclock FPGAs 2) interfacing synchronous standard IPs. The paper is organized as follows. Section 2 presents the methodology to build ANoC structures using the modularity property of asynchronous circuits. We cut out the construction of ANoCs in topology-free basic building blocks. The aim of this decomposition is to help the automatic synthesis of arbiters and of asynchronous interconnect networks. Section 3 focuses on how clock-less circuits are implemented in Quasi-Delay Insensitive (QDI) logic on LUT based FPGAs using a dedicated library. Especially, the proposed methodology throws into relief arbitration and synchronization problems between concurrent elements of the system. For synchronous/asynchronous interfaces we implement in the FPGA standard double flip-flop synchronizers. But to sample and arbitrate concurrent requests in routing nodes of the interconnect network, we

ABSTRACT1 This paper presents an innovating methodology for network-centric Globally-Asynchronous LocallySynchronous (GALS) system prototyping. Highperformance multi-clock FPGAs are exploited for easy and fast prototyping of GALS systems based of an Asynchronous Network-on-Chip (ANoC) interfacing synchronous standard IP cores. Modularity property of asynchronous circuits is fully exploited to design regular distributed interconnect topologies by the means of basic topology-free building blocks, with a focus and special design effort to solve arbitration and synchronization problems. A case-study is implemented on an up-to-date FPGA which includes two independently clocked processors, memory banks, serial and parallel communication links and an asynchronous DES (Data Encryption Standard) module connected through an asynchronous 5x5 crossbar. The clock-less modules are implemented using a quasi-delay insensitive logic on the FPGA by the means of a dedicated library. Performance figures are reported on the FPGA platform, especially for communication costs, speed and latency of the ANoC. 1. INTRODUCTION GALS paradigm is to partition a system design in decoupled clock-independent modules [1]. Design of each module becomes independent from others: performance, power consumption or clock-tree management to name but a few. Another benefit is to separate the design of communication from functionality by using handshake protocol synchronization (amongst other techniques). Asynchronous NoCs (ANoCs) strongly benefit to such a globally asynchronous design methodology. Clock-free interconnect networks improve reliability by removing clock-domain crossing synchronizations and by using delay-insensitive arbiters for solving routing conflicts [2] [3]. They also offer robust communications thanks to an automatic data transfer regulation (elastic pipeline): no data *

now working in Secured Architecture & System Lab., Ecole Superieure des Mines de St Etienne, Centre Microelectronique de Provence Georges-Charpac (CMP-GC), 13541 Gardanne, France. [email protected]

0-7803-9362-7/05/$20.00 ©2005 IEEE

299

use self-timed arbiters with delay-insensitive synchronizers. LUT cells are programmed in a specific manner to properly model these very particular synchronizer cells. In section 4 we demonstrate the network-centric GALS building methodology with a communication system casestudy implemented on the Stratix Altera FPGA. First results about communication costs/performances of the Asynchronous NoC are reported.

resources are allocated to different user or client processes. Packet routers are such a case. In the case of an asynchronous NoC, delay-insensitive arbiters have this main advantage of being hundred-percent reliable (enough time is given to resolve metastability). Reliability of onchip communication systems is becoming a major issue since the increase transaction rates is drastically reducing the so-called Mean Time Before Failure characterizing clocked synchronizers. In [2] we present a class of delayinsensitive arbiters which decouple the sampling of incoming requests from the arbitration process in a strong modular and reliable structure. Such arbiters are called Parallel-Request-Sampling Priority-Arbiters and are used in [3] and in this paper’s ANoC structure.

2. DESIGN OF AN ASYNCHRONOUS NOC FOR GALS ARCHITECTURES The ANoC design methodology presented in this section seems similar to the approach presented by Bainbridge and Lovett. In [4] and [5], the design methodology is strongly modular, using simple one-to-two and two-to-one switches to build regular topology networks. In such structures, arbiters are very simple and so efficient for packet routers with few channels to drive. However, assembling these switches will heavily increase latency costs for large multiinputs/outputs routers. The cost in term of FPGA cells will be high if a circuit prototyping is envisaged. We present another ANoC modular design, with more building blocks offering a better tradeoff between large specter of configurable topologies and efficient performances.

2.3. Modular Asynchronous NoC Structure Our interconnect networks fully exploits the modularity property of asynchronous circuits. We cut out the construction of our asynchronous NoCs in five basic components, as illustrated in Fig. 1 a). Fig. 1 b) shows that these components can be seen as layers. 1. Wrapper Adaptor (WA). This resource is required to translate between the communication protocols used by a synchronous or asynchronous peripheral and the interconnect network. The WA component adapts both flit and packet levels of the communication protocols. The details of these protocols are beyond the scope of this paper [3]. In the case of an Asynchronous peripheral Blocks (AB), this WA is optional, depending of the native communication protocol used by the AB (Figure 1). 2. Synchronization & Performance Interface (SPI). This component binds the Synchronous peripheral Block (SB) clock frequency domain with the ANoC using a FIFO decoupling method. The SPI is built of a standard double flip-flop (DFF) synchronizer and of an asynchronous FIFO (section 2.4). The asynchronous FIFO transforms synchronous protocol in corresponding asynchronous one. The DFF resynchronize asynchronous signals with the SB clock. The DFF offers actually a very sufficient reliability for a two clock cycle latency per input signal sampling [16], compared to numerous clocked synchronizer’s improvements. Ginosar shows in [17] that latencyimproved (respectively robustness-improved) synchronizers strongly degrade their robustness (respectively their latency). The asynchronous FIFO adapts relative speeds between the SB and the ANoC. For AB, such a FIFO is optional and can be used for pipeline performance optimization. In this case we call it Performance Interface (PI) (Figure 1). Details of the asynchronous FIFO architecture are given in [18]. This architecture is based on an existing asynchronous FIFO [19]. The level of parallelism between data and control flows is improved and two versions are delivered: a low-latency or a low-power consumption version, according to design requirements.

2.1. Design Methodology We specify and model asynchronous circuits in CHP language [11] at the DTL level [12]. The processes are synthesized using TAST, a frame suite tool [13] dedicated to asynchronous circuit synthesis. The TAST tool enables to map the CHP specification onto a standard-cell library and/or a specific cell library [14] when targeting ASICs, or to map onto FPGA for rapid system prototyping [8]. 2.2. Focus on Synchronization Bolts Our methodology for designing ANoCs is focused on solving synchronization problems. The two major synchronization bolts for a GALS system are: synchronization at clock domain boundaries and arbitration between concurrent requests [15]. Such circuits have a nondeterministic behavior. We put special invest and design effort to improve reliability/performance tradeoffs of these synchronizer circuits. Clocked Synchronization. As discussed in the introduction, using an ANoC is in itself a reliability improvement by removing clock-domain crossing synchronizations through the interconnect network. However clocked synchronizers are still required between Synchronous peripheral Blocks (SB) and the ANoC. Discussion on this synchronous/asynchronous interface is developed in section 2.3. Delay-Insensitive Arbiters. Arbitration circuits, or simply arbiters, are required where a restricted number of

300

PRS-PA Synchronous Block

Wrapper Adaptator

Synchronization Interface

Interconnect Network

Selected_Channel

Priority_level

Packet Transport Wrapper Adaptator

Wrapper Adaptator

Switch ctrl

Request CMD_Mux

PA1

Performance Interface

Channel1_ctrl

Transfer_mode

Packet Analyzer1

Channel1_data

Routing Nodes : Packet Routing & Arbiters Synchronous Block

Data Path Controller Transfer_ctrl

Request probe

Asynchronous Block

DPC

Sampling

Parallel-Request-Sampler Priority Arbitrer

PA2 Channel2_ctrl

S

Packet Analyzer2

Channel2_data

Synchronization Interface

Switch (MUX)

PA3 Channel3_ctrl

Mux_out (Selected_data)

Packet Analyzer3

Channel3_data

PA4

Synchronous Receiver

Synchronous Emitter

Channel4_ctrl

Packet Analyzer4

Channel4_data

Wrapper Adaptator Synchronization Interface

Wrapper Adaptator Synchonous World

Target_Ctrl0

Synchronization Interface

Asynchonous World

Packet Transport

DR[3]

Packet Transport Packet Router Packet Router

Asynchronous Emitter

P2P Interconnect Links

DEMUX

Packet _Ctrl

Packet Transport

Target_Ctrl1 Target_Ctrl2

Packet Router & Arbiter

Target_Ctrl3

DR[3]

Packet Router & Arbiter Target_Data0

Asynchronous Receiver

Fig. 1. Asynchronous NoC-centric GALS architecture: a) abstract structure b) layer structure.

DEMUX

Packet _Data MR[4][18]

Target_Data1 Target_Data2 Target_Data3

MR[4][18]

3. Packet Transport (PT). This resource adapts the physical (or signal-level) level of the communication protocol. The PT component provides successive protocol conversions from SPI component to delay-insensitive NoC core for best power consumption and robustness. Between SPI and PR layers, bundle data protocols are converted in QDI protocols for better robustness. Between the packet routers (PR layer), the four-phase protocols can be converted in 2-phase protocols for long interconnect links for better power consumption [20]. 4. Parallel-Request-Sampling Priority-Arbiter (PRSPA). This resource provides a self-timed arbiter with a decoupled arbitration process and a 100% reliable request sampling structure based on delay-insensitive parallel synchronizers [2][21] (section 2.2). 5. Packet Routing (PR). This resource offers a modular routing of data items for transaction services (packet level services such as burst mode or split transactions). In Fig. 1 PRS-PA and PR resources are put together because they are parts of ANoC routing nodes, as detailed in section 2.3.

Target_Address

Target_Address Decoder

Fig. 2. Switch components: a) n-to-1 b) 1-to-m. The PR resource is decomposed in three modules: Packet Analyzer (PA), Data Path Controller (DPC) and MUX module. The Emitter component delivers two major classes of packet level services: arbitration service and transaction service. The PA block decodes Channeli_ctrl message in order to extract arbitration and transaction information parts and to drive it respectively to the PRS-PA and DPC modules. Arbitration information is composed of Request and Priority_level (optional) channels, used by the PRS-PA module to arbiter incoming requests. Once a Channeli_data is elected, PRS-PA informs the datapath controller module (DPC) through Selected_Channel. DPC exploits it and the Transfert_mode channel to control data flow on the elected Channeli_data and to drive the switch output (MUX module). Through Transfert_mode channel, transaction information delivers packet status, such as single flit packet or for burst mode: start-packet flit, body flit, end-of-packet flit. Once the packet transfer is achieved, DPC module informs the sleeping arbiter module PRS-PA through Sampling channel that a new transaction can start. Receiver module. The 1-to-m Switch, or Receiver, is a PR component which realizes the dual operation by driving the input (Packet_Ctrl and Packet_Data channels) to the selected Target_Address. No arbitration is needed here. By composing these switches we can build in short design time fast and efficient routing nodes (sections 4.2 &5.1).

2.4. Switches architecture for ANoC Routing Nodes Packet router is the core component of an interconnect network. The packet routers are assembled with modular elementary blocks, as shown in Figure 2, with the same objectives of low-complexity, easy “plug-and-play” and scalability as for the complete ANoC. Emitter module. Figure 3 illustrates two switch instances. The n-to-1 switch, or Emitter, is built around the PR (Packet Router) and PRS-PA (Priority Arbiter) components, as previously presented in section 2.2.

301

A B

S Q

S

Q_1

D R Clk

Fig. 3. Hazard-free implementation of MULLER2 with AND and OR gates.

S Q

Q_0

R \Q_0

3. SYNTHESIS OF DELAY-INSENSITIVE CIRCUITS ONTO FPGAS

Fig. 4. Synchronizer architecture for FPGA cell mapping. Section 2 described the structural methodology to quickly design modular ANoCs. This section reminds the methodology to synthesis asynchronous circuits onto FPGAs [8] and presents a special mapping circuit for delayinsensitive synchronizers devoted to asynchronous arbiters.

which can produce the forbidden transitions. Fig. 4 presents a novel synchronizer structure dedicated to FPGA cell implementation. Five LCELL elements are needed to implement the synchronizer equation. This structure resolves metastability in an unbounded amount of time and avoids glitches. The two dissymmetric Müller gates (or C-element) transform a single-rail input in a dual-rail output, providing a standard three-state data encoding. D signal can be seen as a request incoming value sampled by the reference signal Clk (which is not a periodic clock but an asynchronous sampling signal). The following transition (Clk,D)=01o10 produces a glitch at the output of the C-elements, as previously explained. So we use adapted RS flip-flops as well-working glitch killers. The glitch viewer element has only a bench purpose to demonstrate the good working of the glitch killers, as shown in Fig. 5. This waveform shows a post place-and-route simulation of the synchronizer complete structure (Fig. 4). The same result is obtained after synthesis on the Stratix Altera FPGA. The characterizing methodology is given by [8]. D and Clk signals are driven with closed frequencies: 50MHz for D signal and 50,2MHz for Clk signal. In first part of the waveform value 0 is sampled: dual-rail output signal Q gives (Q_1,Q_0)=00 (RZ state)o10 periodic sequence and the glitch viewer \Q_0 signal (qb_0 in Figure 5) alternates between RTZ state and Q_0 complementary value. In the second part value 1 is sampled, corresponding to the (Clk,D)=01o10 transition. Output signal Q gives (Q_1,Q_0)=00 (RZ state)o01 periodic sequence and the glitch viewer \Q_0 signal shows glitches.

3.1. Methodology We give in [8] a generic methodology to properly place and route mixed synchronous/asynchronous circuits onto a FPGA, respecting the specific timing assumptions of either Quasi Delay Insensitive (QDI) or micropipeline (μP) asynchronous design techniques. To avoid hazard in FPGAs, the appearance of hazard in configurable logic cells is analyzed. The technique is based on the use and the design of a Muller gate library, which prevents glitches in response to single-bit changes in the inputs. The technique does not apply when more than one input changes at the same time. Close examinations of the K-map in Figure 3 suggest what caused the glitch. When the initial and final inputs are covered by the same prime implicant, no glitch is possible. But when the input change spans prime implicants, a glitch can happen. The hazard-free implementation of MULLER2 is given in Figure 3. The feedback output signal of the Muller gate is considered as input. The minimization of canonical minterms in three overlapping terms AB, AS-1, BS-1 makes possible to avoid static hazard depending on the delay distribution when an input changes. Thus, this implementation is hazard-free when one input changes. Multiple input change is allowed, except for the transitions AB=01o10 and AB=10o01 which produce a glitch. TAST produces circuits, which do not exhibit such behaviors. Hence simultaneous transitions AB=01o10 and AB=10o01 have not to be considered in hazard analysis. This ensures that the circuits are hazard-free.

4. A GALS ARCHITECTURE IMPLEMENTED ONTO A STRATIX ALTERA FPGA This section presents an ANoC-centric GALS architecture implemented onto a multiclock FPGA. The synthesis methodology presented in section 3 is applied to the clockless modules of the architecture (ANoC and DES). The ANoC is designed according to the modular building method of section 2.

3.2. Synchronizer implementation TAST can not produce arbiter circuits with nondeterministic behavior, due to their synchronizer elements (this is true for both clocked and delay-insensitive arbiters/synchronizers). Consequently the previous methodology doesn’t work for the synchronizer element,

302

« D=1 » sampling

« D=0 » sampling

RS232 Interfaces

RAM

q_1 q_0

ROM

qb_0

RS232

RS232

MIPS1 50MHz

MIPS2 10MHz

RAM ROM

Asynchronous 5x5 Crossbar

ANoC Interface

Glitches

Fig. 5. Trace of synthesized synch with the glitch viewer.

RAM

Asynchronous DES

Fig. 6. Structure of PACMAN case-study version.

4.1. PACMAN platform

assembled in more complex regular interconnect topologies, such as meshes. The aim is to deliver a dedicated synthesis tool for automated asynchronous regular interconnect topologies generation, thanks to modular and locality properties of asynchronous circuits (no global design considerations). So far, the adjustable parameters are: 1. Crossbar size. It depends of component number. 2. Point-to-point (p2p) interconnects width. This parameter defines the width of each interconnect path according to the required bandwidth of each p2p linked SB or AB. 3. Programmable priority algorithm. Available policies are round-robin, FIFG and non-interruptible two-level priority policies. The FIFG policy can be programmed independently for each routing node. 4. Transaction services. DPC module can be programmed to support data transfer mode services. For the time being, only the burst mode is available. All routing nodes must support the same transaction services.

Only recent high-performance FPGAs offer the ability to prototype multiclock GALS systems. We demonstrate our network-centric GALS building methodology with a case-study implemented on a Stratix Altera FPGA. This system is a first prototype version of a generic GALS platform called PACMAN, for Programmable And Configurable Multiprocessor Asynchronous Network. The PACMAN first-version architecture (Fig. 6) includes an ANoC interconnecting four processing elements. The asynchronous NoC is a 5x5 crossbar with a direct output parallel communication link. The ANoC delivers both arbitration and transaction services (section 2.4). The arbitration policy is a non-interruptible two-level priority policy. When concurrent incoming requests need arbitration, a request with the high-priority level is selected and low-priority level requests are suspended. For equal priority-level concurrent requests, a First-In First-Granted (FIFG) policy is used. A former selected channel can not be interrupted by an incoming higher priority-level request during a burst mode data transfer. The high-priority level is assigned to the MIPS processors. The transaction service delivers burst mode or simple on-flit packet transfer modes, plus a special service called Indirect-Response (IR). In IR mode, a peripheral A initiator of a communication, notify the receiver B not to answer to A, but to a third peripheral. The four processing elements are: - Two independently clocked MIPS with local RAM banks and serial communication links. One MIPS is running at 10MHz for interfacing purposes whereas the other MIPS is running at 50MHz for number crunching applications. - A self-timed DES module (Data Encryption Standard). - A shared RAM bank.

4.3. First results The FPGA platform successfully supports the PACMAN architecture implementation. In first post place-and-route simulations, ANoC delivers packet transport latency of 5ns from packet router to packet router. This is a fully asynchronous performance. Including the two WA modules for MIPS1 and MIPS2 components, the latency increases to 78ns for a MIPS to MIPS communication, giving a throughput from 13Mpackets/second (there is no pipeline in the implemented version of the ANoC). 5. CONCLUSION

4.2. Automatic crossbar generation

In this paper, it is shown that a GALS system with mixed synchronous and asynchronous circuits can be fast prototyped onto a multi-clock FPGA thanks to an asynchronous NoC which releases global design constraints. This ANoC properly interfaces communication protocols with standard synchronous peripherals and offers a reliable and efficient arbitration solution. Strongly modular building of ANoCs ensures short design time and

We use an automatic crossbar topology generation tool to implement the 5x5 crossbar ANoC. The tool controls adjustable design parameters for some of the five ANoC modular blocks/layers. It supports fully-interconnect or Octagon [22] topology generation and modular routing node cores generation, which can be hand-adapted and

303

consequently a fast GALS system prototyping. The interconnect topology generator delivers several configurable interconnect topologies which facilitate the system architecture exploration. Synchronizer and arbitration circuits are properly mapped onto the FPGA. First result analysis on communication performances give promising asynchronous NoC capabilities to deliver fast and robust communication quality of service. The exploitation of FPGA platform results must be detailed, in order to improve ANoC fast design and mapping, especially for Wrapper Adaptors which are the bottleneck of the communication performances. Prospective works will be focused on to deliver a dedicated synthesis tool for fully automated asynchronous regular interconnect topologies generation.

circuits on commercial synchronous FPGAs”, Proc. of the 2005 ACM/SIGDA 13th Int.Symp.on Field-programmable gate arrays, Monterey, California, USA, 2005. [10] C. Piguet, M. Renaudin, T. Omnés, “Low-power systems on

chips (SOCs)”, Proc. of the Conf.on Design, Automation, and Test in Europe, Munich, Germany, 2001. [11] Y. Semiat et R. Ginosar, "Timing Measurements of

Synchronization Circuits", Proc. of the Ninth Int.Symp.on Advanced Research in Asynchronous Circuits and Systems, ASYNC'03, Vancouver, Canada, May, 12-16 2003. [12] A.J. Martin, "Programming in VLSI: from communicating

processes to delay-insensitive circuits", in C.A.R. Hoare, Developments in Concurrency and Communication, UT Year of Programming Series, 1990, Addison-Wesley, p. 1-64. [13] Anh Vu Dinh Duc, Laurent Fesquet, Marc Renaudin,

"Synthesis of QDI Asynchronous Circuits from DTL-style Petri-Net", IWLS-02, 11th IEEE/ACM Int.Workshop on Logic & Synthesis, New Orleans, Louisiana, June 4-7, 2002.

6. REFERENCES [1]

F. K. Gürkaynak, S. Oetiker, N. Felber, H. Kaeslin et W. Fichtner, "Is there hope for GALS in the future ?" Proc. of the 4th Asynchronous Circuit Design Workshop (ACID 2004), Turku, Finland, June 28-29, 2004.

[14] A.V. Dinh Duc, J.B. Rigaud, A. Rezzag, A. Sirianni, J.

Fragoso, L. Fesquet, M. Renaudin, "TAST CAD Tools: Tutorial", tutorial given at the Int Symp. on Advanced Research in Asynchronous Circuits and Systems ASYNC'02, Manchester, UK, April 8-11, 2002, and at the ACiD Summer School on “Asynchronous circuits design”, Grenoble, France, July 15-19, 2002. TIMA internal report ISRN:TIMA-RR-02/07/01—FR, http://tima.imag.fr/cis.

[2]

J. B. Rigaud, J. Quartana, L. Fesquet et M. Renaudin, "Modeling and design of asynchronous priority arbiters for on-chip communication systems", Proc. of the VLSISOC'01, 11th IFIP Int. Conf. on Very Large Scale Integration, Montpellier, France, 3-5 Dec. 2001.

[3]

E. Beigné, F. Clermidy, P. Vivet, A. Clouard, M. Renaudin, "An Asynchronous NOC Architecture Providing Low Latency Service and its Multi-Level Design Flow", 11th IEEE Int. Symp. on Asynchronous Circuits and Systems (ASYNC'05), March 14-16, NY, USA, pp. 54-63, 2005.

[15] Ph. Maurine, J.B. Rigaud, F. Bouesse, G. Sicard, M.

W. J. Bainbridge et S. Furber, "CHAIN: A Delay Insensitive CHip Area INterconnect", IEEE Micro, vol. 22, no. 5, pp. 16-23, September/October 2002.

[16] R. Ginosar, "Synchronization and Arbitration", Proc. of the

[5]

W. O. Lovett, "CHip Area Network Simulation", Master of Science, University of Manchester, 2002.

[17] R. Ginosar, "Fourteen ways to fool your Synchronizer",

[6]

T. Villiger, H. Kaeslin, F. Gurkaynak, S. Oetiker et W. Fichtner, "Self-timed Ring for Globally-Asynchronous Locally-Synchronous Systems", Proc. of the Ninth Int. Symp.on Advanced Research in Asynchronous Circuits and Systems, ASYNC'03, Vancouver, Canada, May, 12-16 2003.

[4]

[7]

[8]

[9]

Renaudin, “Static Implementation of QDI asynchronous primitives”, PATMOS'03 - 13th Int.Workshop on Power and Timing Modeling, Optimization and Simulation. Torino, Italy, September 10-12, 2003. ACiD Summer School on Asynchronous Circuit Design, Grenoble, France, July 15-19 2002. Proc. of the Ninth Int.Symp.on Research in Asynchronous Circuits and Systems, ASYNC'03, Vancouver, Canada, 2003. [18] J. Quartana, K. Slimani, M. Renaudin, “Asynchronous FIFO

for Efficient and Reliable Synch/Asynch Interfaces in GALS Architectures”, VLSI-SOC’05, submitted. [19] T. Chelcea et S. M. Nowick, "Robust Interfaces for Mixed-

Timing Systems", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 8, pp. 2004.

N. Huot, H. Dubreuil, L. Fesquet and M. Renaudin, "A FPGA Architecture for Multi-style Asynchronous Logic", Design Automation and Test in Europe (DATE), Germany, March 2005.

[20] R. Ho, J. Gainsley et R. Drost, "Long wires and

asynchronous control", Proc. of the Asynch'04, 2004.

T. Q. Ho, J. B. Rigaud, M. Renaudin, L. Fesquet et R. Rolland, "Implementing Asynchronous Circuits on LUT Based FPGAs", Proc. of the Field-Programmable Logic and Applications, Reconfigurable Computing Is Going Mainstream, 12th Int.Conference, FPL 2002, Montpellier, France, September 2-4, 2002.

[21] A. Bystrov, D. J. Kinniment, A. Yakovlev, “Priority

Arbiters”, in Int.Symp.on Advanced Research in Asynchronous Circuits and Systems (ASYNC'00), Eilat, Israel, April 2000, pp. 128-137. [22] F. Karim, A. Nguyen et S. Dey, "An Interconnect

Architecture for Networking Systems on Chips", IEEE Micro, vol. 22, no. 5, pp. 36-45, September/October 2002.

M. Najibi , K. Saleh , M. Naderi , H. Pedram, M. Sedighi, “Prototyping globally asynchronous locally synchronous

304