Runtime reconfigurable interfaces - the RTR-IBF ... - Xun ZHANG

ured computation modules in real-time environments lead to massive ..... systems. Therefore multiple operation modes are possible for each handler. When a ...
3MB taille 2 téléchargements 294 vues
Runtime Reconfigurable Interfaces - The RTR-IFB Approach Stefan Ihmor University of Paderborn Heinz Nixdorf Institute Fuerstenallee 11 D-33102 Paderborn - Germany

Wolfram Hardt Chemnitz University of Technology Faculty of Computer Science Strasse der Nationen 62 D-09107 Chemnitz - Germany Communication

Abstract Reconfigurable architectures have become more and more popular due to technology improvements. Next to the implementation of complex applications the inter-module communication becomes a critical aspect in reconfigurable architectures. Especially, dependencies between reconfigured computation modules in real-time environments lead to massive reconfiguration efforts. The approach delivered in this paper presents runtime reconfigurable interface blocks (RTR-IFB) which can be used to solve those dependencies. The switching of modules during runtime, synchronization and inter-module communication is handled. As one effect reconfigured modules can share the same execution resources without having a reconfiguration based communication gap. The RTR-IFB methodology extends actual concepts for inter-module communication. A design flow specifies how RTR-IFBs are integrated in the partial reconfiguration design flow. An example shows how an RTR-IFB can be integrated into a realistic design.

1. Introduction In the past configurable architectures have become very popular. Rapid improvements on the technology sector have been an important trigger for this process. Field Programmable Gate Arrays (FPGAs) are one of the representatives of this technology. Since FPGAs became reprogrammable, designs can be changed not only in the initial configuration phase but even during runtime (RTRarchitectures). The (re)configuration of FPGAs is currently technically restricted to a column-wise programming style. A partitioned design has to be distributed through the design space in the dimensions of space and time. At least this means a mapping to FPGA columns according to a given time slot. Spatial- and partial reprogramming methodologies

Modul A Task / Medium

Computation

Fixed Comm. Point

Modul B

Adaptation (Interface Block)

Computation

Task / Medium

Figure 1. Bus macro cope with this problem. One part of a partitioned task which can cover a number of adjacent columns is called module. There are two different ways of reprogramming an FPGA. The first approach stops the total device in execution and updates that portion of the configuration memory of the design which has to be changed. To interrupt the execution all clocks are frozen. No computation or communication will be performed in any way until the clocks are reactivated again. The second approach is called runtime reconfiguration and doesn’t influence the functionality of unchanged sections of the FPGA in any way. It just reconfigures the affected frames by updating the dedicated configuration memory. In reconfigurable architectures it is necessary to have a fixed point at the intermediate layer between two modules to allow inter-module communication as figure 1 shows. Partial configuration demands that resources used by a reconfigurable module cannot exist out of the dedicated frame boundary. Moreover, the routing that is used to connect signals crossing reconfigurable module boundaries cannot change when a module is reconfigured. The TBUF bus [14, 12] which was presented by Xilinx is one mechanism to realize a predictable control of routing resources of signals that cross such boundaries. As long as not the complete routing information is known during the generation of a new module, we have to use these fixed points to interconnect various modules. In the other case we would have

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Configuration Bit-Stream ... 1 0 1 1 1 0 1 0 0 1 0 0 1 ... Synthesis VHDL

Protocol Handler PHin Modes

Interface Task / Medium

Control Unit Sequence Handler SH Modes

Protocol Handler PHout Modes

Mapping Interface

IFD

Task / Medium

of computing tasks (each one consists of one IFD) has to be appointed. These tasks can be gathered from IPs, VHDL designs, etc. Then the communication between these tasks has to be specified. This can be done by a mapping of the exchanged data packages, which are automatically extracted from the IFDs. The completed IFB which will be explained in detail in the third chapter is a target independent description, which can be compiled to various targets, e.g. VHDL. Thereafter we include the generated VHDL code into the existing design and attach a logic synthesis down to the configuration bit stream. The following chapter highlights relevant related work. Chapter three presents the inner functionality of the IFB in detail and will explain the advantages of an IFB for runtime reconfiguration (RTR). Chapter four shows an example where an RTR-IFB is embedded into a controller design. A conclusion and future work is given in the closing chapter five.

2. Related work Various approaches can be found in the literature that cope with inter-module communication. The TBUF bus [12, 13] which was presented by Xilinx is an example how this could be realized.

Figure 2. IFB design flow to evaluate the routing data of all connected modules to find out their communication points. Nevertheless these fixed points bring some restrictions along. In a real-time environment it is necessary to have predictable behavior at runtime for all tasks of the design. This includes the computational as well as the communication aspect. During the reconfiguration process of one module the other modules will remain active, but the communication will get stuck because of the missing signal source or destination, respectively. The insertion of fixed points allows us to set the remaining communication lines to a predefined state to have a predictable behavior, but a complex behavior cannot be specified and this violates the demands of some real-time systems. To overcome this gap we can use Interface Blocks (IFB) [11, 6, 7]. IFBs are adapter modules which can be generated interactively by evaluating the Interface Description (IFD) of the connected tasks (see figure 2). The IFB generation is integrated into a system level design flow1 [8, 11, 10, 9]. To generate an IFB automatically we need at least the Interface Descriptions (IFD) of all connected tasks (or media), a target-platform description (TPD) and a mapping of the exchanged data (IFD-Mapping). The descriptions are represented in an XML-based style [15]. Figure 2 illustrates this design flow. First of all a set 1 Developed at Paderborn University in cooperation with University of Technology Chemnitz.

2.1 The Xilinx bus macro The TBUF bus is also known as Xilinx Bus Macro as shown in figure 3. Using these macros the necessary fixed points can be automatically included into the partial design flow.

Figure 3. Xilinx bus macros [12] Xilinx provides these bus macros for the placement and the exact routing of inter-module communication signals as you can see in figure 3. Up to four bits of data can be shared per macro, and the user has to instantiate as many of these

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

macros as is needed to cover all the cross-module signals in his design. The implementation of the bus macros is based on tri-state drivers. Due to this a reconfiguration will always lead to the state of high impedance on the dedicated signals. The final size of a bus macro is determined by the number of connected signals. In figure 3 some bus macros of different sizes are placed between two reprogrammable (PR) modules and two fixed modules on the borders of the design. The concept of RTR-IFBs doesn’t replace the necessity of such fixed points to interconnect between two modules, when one of them will be reconfigured dynamically. But the users view towards the interfaces will completely change, because the included fixed points move into the RTR-IFB.

2.2 System design with specC Another scientific approach that deals with communication synthesis in system design was presented by Gajski in the SpecC context [5, 4]. The SpecC methodology offers a complete design flow, from an abstract model to a concrete implementation on Register Transfer (RT) level. In SpecC incompatible tasks, e.g. IPs, can be connected to the remaining design by wrapper or adapter modules. The creation of the wrappers or adapter modules is comparable to the generation of an IFB. Although a complete design methodology is presented by the SpecC approach the generation of the wrapper or the adapter modules is not explained in detail in the literature. Furthermore the published work in the SpecC domain doesn’t cope with runtime reconfiguration up to now.

3. Runtime reconfigurable IFB 3.1 The interface block macro structure Much effort is spent interconnecting incompatible designs in hardware as well as in software. Two ways are clearly to differentiate. The first one expects the designer to create an adapter module between two or more communicating interfaces. The second approach consists on a set of standards and forces the designer to adapt his task or medium to these standards. The IFB approach can be understood as an automatization of the first way. This includes the adaptation to standard interfaces as well because interfaces which have been described once in form of an interface description (IFD) can be reused in future designs. The key point of the IFB approach is the division of the interface into three parts: The incoming protocol handler (PH  ), the outgoing protocol handler (PH ) and in between the conversion processing called sequence handler (SH). Each handler consists of a set of operation modes.

CU FSM

T A S K

status

PHIN FSM FSM

data handshake

controls

SG SH FSM FSM status

activates

P1 M1

P2 M2

modes

S1 M1

PHOUT FSM FSM

S2 M2 modes

S3 M3

P1 M1

M E D I U M

P2 M2

modes

Figure 4. The IFB macro structure

A control unit (CU) realizes the internal control of the IFB. Therefore the CU offers control signals to the PHs and SH and evaluates their status signals as feedback. So control and status signals flow vertically. In contrast the data and handshake signals are communicated in the horizontal direction. This special partitioning of signals is herein after referred to as orthogonal communication structure. The overall structure of communicating components inside the IFB is known as IFB macro structure which is implemented by a set of hierarchical FSM. The general structure is given in figure 4. The incoming data stream is feed into the PH  which extracts the information parts out of the protocol. Then this net data is re-sequenced by the dedicated SH mode according to the outgoing protocol. Finally the modified data is merged into the outgoing protocol by the dedicated PH modes. Obviously, the incoming (FSM ) as well as the outgoing data stream (FSM ) is also described by an FSM as one part of the IFD. It is easy to see that PH  can be constructed from FSM by building the complementary protocol-FSM. The same holds for the construction of PH out of FSM . The IFD-Mapping is used to solve the mapping problem between the exchanged data of PH  and PH . It determines the functionality of the re-sequencing process. In some cases additional memory has to be introduced to store intermediate data. A conversion of data bit width can be performed at this point very easily, e.g. from bit serial to bit parallel. The main advantage of our approach is that the complete interface functionality is achieved without building the product automat of FSM and FSM . In general the mapping between FSM and FSM is not even feasible without a global semantics of the exchanged data. But such an overall semantics is not to expect because FSM and FSM could derive from semantically different domains. Here, the designer is interactively included into the design process to complete the mapping function. The coordination of all FSMs is performed by the control unit also implemented as FSM. This control unit allows the introduction of runtime reconfiguration. Time specific as-

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

CU starts Reconfiguration Procedure Stop PH Mode in Execution

Disable all reconfigured Modes Phout SH PHin Reconfigure Modes Phout SH PHin Enable all reconfigured Modes Phout SH PHin

es

Re

conf

iguration of dev ic

e

Switch SH to Reconfig-Mode

nge only affected mo d

The IFB itself was developed for to operate in dynamical systems. Therefore multiple operation modes are possible for each handler. When a task (or medium) wants to change it’s behavior a switching to the dedicated mode is necessary. This approach is restricted to such behaviors which can be specified during design time before the implementation. To support runtime reconfiguration the IFB methodology had to be extended to the RTR-IFB approach. Here the functionality of reconfigured tasks change at runtime. This implies that the interfaces of reconfigured tasks will change as well. Bus macros were integrated as extensions into the IFB to allow the reconfiguration of single operation modes. A special partitioning of the RTR-IFB assures the exchangeability of the macro structure. The placement of this partitions is described in the following section. Another feature of the RTR-IFB is the additional userdefined SH mode, which is activated during reconfiguration. The designer can specify a behavior as FSM which will be executed while the reconfigured task is offline. A deterministic behavior can be assured for real-time environments in this way. The algorithm implemented by the reconfiguration mode should be a static sequence or a simple work around based on the last transmitted values like repetition or increment operations. The procedure of one reconfiguration process is explained in figure 5 in detail. The Control Unit of the RTRIFB has to get a status signal that a reconfiguration will take place to activate the internal reconfiguration procedure.

Inform CU

cha

3.2 RTR-features of an IFB

External Control

Ex

pects and behavioral strategies are implemented inside the CU. The IFB generation is integrated into a system level design flow as mentioned in the introduction (see figure 2). To generate an IFB automatically we need some descriptions which will be explained here a little bit more in detail. The descriptions are coded in an XML-based way because actually this is the most popular format for transfer formats [15]. All tasks and media consist of an Interface Description (IFD) which holds information of the physical structure, electrical properties and the used protocols. A target platform description (TPD) is restricted to the physical structure and the electrical properties. To build an application which automatically transfers the extracted data from the protocol handler a mapping of the exchanged information (IFD-Mapping) is required. In a high level synthesis all these descriptions are evaluated and a target platform independent description of the IFB is generated. Later a code generation compiles this description into a hardware or software target code e.g. VHDL. The exact procedure of generating an IFB is not topic of this paper. The IFB implementation is automated for FPGAs.

Switch SH to Working-Mode

Start PH Mode in Execution

Figure 5. The Reconfiguration Procedure

3.3 The RTR-IFB FPGA-placement First of all the control unit switches the sequence handler mode from the activated operation mode to the reconfiguration mode which has been specified by the system designer before. In parallel the task dependent protocol handler mode will be stopped. Afterwards, all modes which are affected by the reconfiguration are disabled by setting the bus macros to high impedance. Now the reconfiguration of the task in addition with the new protocol handler mode which has been generated externally together with the new task as described in figure 2 takes place. Dependent on the kind of modifications of the protocol handler mode the sequence handler mode has to be reconfigured as well. If necessary a new sequence handler mode is inserted or an obsolete one is replaced. Then the new modes are connected and enabled by the bus macros. After this the protocol handler mode is activated. Synchronously the sequence handler is switched back to the operation mode. There are different possibilities to place a RTR-IFB onto an FPGA. The first option is a more packed variant which will save space against one alternative that offers a higher degree of freedom for reconfiguration. Figure 6 presents

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

TBUF

SH

SH M1

Task 1 SH M2

Phin TBUF

PHin M1

TBUF

CU

TBUF

Task 2

Phout

PHout M1

Figure 6. Placement of an RTR-IFB on an FPGA the placement which offers the maximum degree of freedom, because only the skeleton of the IFB Macrostructure is fixed inside one slot. The skeleton of the IFB consist of the CU, the PHin, the PHout and the SH. All implied modes are implemented in separated slots that can be reconfigured on it’s own, managed by the control unit and separated from the skeleton by bus macros. This allows the high degree of freedom because each mode has one related fixed point to be connected to the concerning (protocol- or sequence-) handler inside the RTR-IFB skeleton. It is possible to route all necessary communication data across this fixed point which implies that this fixed point is variable in it’s size (e.g. bus width). The protocol handler mode is directly placed in that module where the communicating task is located in. Then this mode can be immediately reconfigured with the task.

fers the necessary fixed points for the connections realizing the inter-module communication to the connected modules. In case the sampling rate of the controller is less than the reconfiguration time in addition with the computation time the reconfiguration can be performed in a one-slot solution. Otherwise, the actual controller has to stay active during the process of exchange until the reconfigured one is fully configured. Afterwards the module switching is performed by the multiplexer. In this case it is not possible that both controllers share the same resources or otherwise we would get a communication and computation gap. In the upper half figure 7 visualizes the scheduling of the two controller tasks T1 and T2 for the actual design. T1 is configured into slot S1 and T2 into slot S2. In the upper middle you see the process of switching T1 and T2. This happens immediately, but therefore two slots have to be reserved all the time to be able to reconfigure at any time.

4. Example: A multi-controller design 4.2 Time estimations 4.1 The multi-controller To give an example where our RTR-IFB can be useful we assume a RTR multi controller design. This design contains a controller with multiple instances for the control algorithm which can be exchanged by reconfiguration. Such a controller has been implemented and evaluated in our working group [3, 2, 1] using Xilinx design flow for partial reconfiguration as a one and a two slot solution. In the current design a set of instances for the controller algorithm is available. A high level control exchanges these instances via reconfiguration at runtime using a multiplexer which of-

Some measured values of the implemented controller show us, that it would make sense to use RTR-IFBs in this design. The computation-time of one controller is here about 500ns or less. Our estimated reconfiguration time is about 1-5ms. This means we would have a configuration time of 100us to 500us per slot for a reconfigurable device that we divided into 10 logical slots. The fact, that the reconfiguration needs much longer than the computation itself (here: 200 - 1000 times) leads to assumption that the active time of one controller has to be much higher than the configuration delay. Depending on the

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

S S2 S1

Multiplexer Config T2 Switch Running T2 & Synch. Running T1

If necessary a new sequence handler mode is inserted or an obsolete one is replaced. Then the new modes are connected and enabled by the bus macros. After this the protocol handler mode is activated. Synchronously the sequence handler is switched back to the operation-mode.

4.4 Effects on the multi controller RTR-IFB

S S1

Reconfig

default

Running Config T2 Running T2 T1

Figure 7. Scheduling of multi controller sample rate (here we assume 10kHz 100us) we have to cope with 1 - 5 overlapping samples caused only by the reconfiguration process. This shows us that the task-switching time cannot be neglected compared to processors.

4.3 The embedded RTR-IFB First of all the IFB implements all necessary functions to communicate with the controller tasks T1 and T2. Then the designer can specify a behavior (sequence handler mode) which is executed during the reconfiguration gap as you see in figure 7 in the lower middle. For example the mode could repeat the last set-value of the reconfigured controller task T1. Based on this decoupling of both controllers beneath each other and to their connected modules, e.g. an actuator, T1 can be removed immediately and T2 is configured into the same slot. The reconfiguration procedure can be described precisely as follows compared to figure 5. The Control Unit of the RTR-IFB has to get a status signal that a reconfiguration will take place to activate the internal reconfiguration procedure. First of all the control unit switches the sequence handler mode from the activated operation-mode to the reconfiguration-mode which has been specified by the system designer before. This reconfiguration-mode implements the behavior that has to be executed during the reconfiguration process. In parallel the task dependent protocol handler mode will be stopped. Afterwards, all modes which are affected by the reconfiguration are disabled. Now the reconfiguration of the new controller in addition with the new protocol handler mode which has been generated externally together with the new controller as described in figure 2 takes place. Depending on the kind of modifications of the protocol handler mode the sequence handler mode has to be reconfigured as well.

We expect the integration of RTR-IFBs to have the following effects on the multi controller design. The total design space will exceed the single slot implementation, but probably be smaller than the two slot solution. Nevertheless the use of an RTR-IFB is useful for different reasons. First of all the approach will fasten the design process because of the integrated design flow of the RTRIFB. Complicating integration procedures of new modules get obsolete. Then the use of RTR-IFBs increases the utilization of the reconfigurable device because ”one-slot solutions” get easier to implement without being restricted by demands like sampling rates. The behavior which has to be performed during reconfiguration process can be easily implemented as sequence handler mode instead of adding some custom hardware depending on the current functionality. Of course the reconfiguration time will increase a little because of the additional RTR-IFB components in the design. Therefore the synchronization between the reconfigured modules is handled inside the IFB.

5. Conclusion In this paper we presented a new approach for interface design. An abstract runtime reconfigurable interface block (RTR-IFB) has been introduced. This RTR-IFB is based on the IFB macro structure which is implemented as hierarchical FSMs with respect to the requirements of the incoming and outgoing data streams. This approach is very efficient due to the avoidance of any product automata. The RTRIFB concept allows reconfiguration of single interface parts during runtime. An example for a correct column wise FPGA placement has been presented. The general IFB concept is not restricted to a FPGA implementation. All FSMs can be implemented in software as well. The reconfiguration abilities of the target software system have to be taken into account. Up to know the IFB approach has been found very flexible and highly adequate for an efficient implementation of the interface problem. Future work copes with different possibilities to place a RTR-IFB onto an FPGA as mentioned in chapter 3. Another trade off is offered by the orthogonal communication structure. It can be varied between a pure serial and a maximum parallel design. Design space exploration in combination

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

with measurements of the allocated resources will help us to make precise estimations.

References [1] M. Bednara, K. Danne, M. Deppe, O. Oberschelp, F. Slomka, and J. Teich. Design and implementation of digital linear control systems on reconfigurable hardware. EURASIP Journal on Applied Signal Processing, to appear, 2003. [2] K. Danne, C. Bobda, and H. Kalte. Increasing efficiency by partial hardware reconfiguration: Case study of a multicontroller system. Proc. of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, Nevada, June 2003. [3] K. Danne, C. Bobda, and H. Kalte. Run-time exchange of mechatronic controllers using partial hardware reconfiguration. In Proc. of the International Conference on Field Programmable Logic and Applications (FPL2003), Lisbon, Portugal, Sept. 2003. [4] D. D. Gajski, J. Zhu, R. D¨omer, A. Gerstlauer, and S. Zhao. SpecC: Specification Language and Methodology. Kluwer Academic Publishers, University of California, Irvine, 2000. [5] A. Gerstlauer, R. D¨omer, J. Peng, and D. D. Gajski. System Design: A Practical Guide with SpecC. Kluwer Academic Publishers, University of California, Irvine, 2001. [6] Hardt, Wolfram and Visarius, Markus and Ihmor, Stefan. Rapid Prototyping of Real-Time Interfaces. In Field Programmable Logic (FPL) - Poster Session, Belfast, Northern Ireland, UK, October 2001. [7] Ihmor, Stefan. Entwurf von Echtzeitschnittstellen am Beispiel interagierender Roboter. Master’s thesis, Universitt Paderborn, Warburger Str. 100, 33098 Paderborn, November 2001. [8] Ihmor, Stefan and Bastos Jr., Nilson and Klein, Rafael Cardoso and Visarius, Markus and Hardt, Wolfram. Rapid Prototyping of Realtime Communication - A Case Study: Interacting Robots. In Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP’03), June 2003. [9] Ihmor, Stefan and Visarius, Markus and Hardt, Wolfram. A Consistent Design Methodology for Configurable HW/SWInterfaces in Embedded Systems. In Proc. of the IFIP 17th World Computer Congress - TC10 Stream on Distributed and Parallel Embedded Systems: Design and Analysis of Distributed Embedded Systems, Montreal, Canada, Aug. 2002. [10] Ihmor, Stefan and Visarius, Markus and Hardt, Wolfram. A Design Methodology for Application-specific Real-Time Interfaces. In Proceedings of 2002 IEEE International Conference on Computer Design (ICCD): VLSI in Computers & Processors, IEEE International Conference on Computer Design, Freiburg, Germany, Sept. 2002. [11] Ihmor, Stefan and Visarius, Markus and Hardt, Wolfram. Modeling of Configurable HW/SW-Interfaces. pages 51 – 60, Bremen, Germany, Feb. 2003. Shaker Verlag. [12] D. Lim and M. Peattie. Xilinx Application Note XAPP290: Two Flows for Partial Reconfiguration: Module Based or Small Bit Manipulations, May 2002.

[13] XILINX Virtex Data Sheet, DS003(1-4), 2002. [14] Xilinx Application Note XAPP138: Virtex FPGA Series Configuration and Readback, July 2002. [15] M. Visarius, J. Lessmann, W. Hardt, F. Kelso, and W. Thronicke. An XML Format based Integration Infrastructure for IP based Design. pages 119 – 124, S˜ao Paulo, Brazil, 08. 10. September 2003. IEEE Computer Society.

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE