EFFICIENT EXECUTION ON RECONFIGURABLE ... - Xun ZHANG

temporal parallelism to the execution paths on the reconfig- urable device. ... INTRODUCTION. Parallel processing compared to sequential execution brings.
995KB taille 4 téléchargements 399 vues
EFFICIENT EXECUTION ON RECONFIGURABLE DEVICES USING CONCEPTS OF PIPELINING Florian Dittmann Heinz Nixdorf Institute, University of Paderborn Fuerstenallee 11, 33102 Paderborn, Germany email: [email protected] ABSTRACT The efficiency of algorithms on reconfigurable devices can be increased significantly using concepts of pipelining. However, pipelining must not necessarily be limited to introduce temporal parallelism to the execution paths on the reconfigurable device. It also can help to hide the often long reconfiguration time in the case of dynamic run time reconfiguration, i. e., during processing in one area, another one can be reconfigured. This work formularies the stages on run time reconfigurable systems and shows how to derive optimal partitioning of reconfiguration area with respect to the characteristics of the algorithms to be mapped and the characteristics of the execution platform. The goal of the thesis is to develop a comprehensive model for efficient execution of algorithms on run time reconfigurable systems referring to pipeline based run time reconfiguration. 1. INTRODUCTION Parallel processing compared to sequential execution brings significant speed-up for several applications (e. g., in the area of video or audio processing). Using dynamic reconfigurable devices as execution platform for such scenarios enables to achieve hardware re-use, resulting in a lower need of hardware for the same amount of algorithms. This flexibility can only be achieved by a reconfiguration phase, which precedes the calculation phase. The reconfiguration phase consumes time and exclusively occupies a reconfiguration port of the device. The duration of the reconfiguration and the exclusiveness of using the reconfiguration port significantly influence the behavior of the system. Figure 1 shows how the two phases can be modeled depending on the partitioning of the available reconfigurable hardware area. In these figures, we display an FPGA comprising two and three execution areas. A single reconfiguration port leads to the displayed maximal interlocked execution, i. e., reconfiguration (RTR) and execution (EX) phase This work was partly funded by the German Research Foundation Deutsche Forschungsgemeinschaft (DFG SPP 1148).

0-7803-9362-7/05/$20.00 ©2005 IEEE

Fig. 1. Execution pipeline using two partitions (left), and execution pipeline using three partitions (right). overlap and form a pipelined system, thereby hiding the reconfiguration time. Increasing the amount of partitions on the reconfigurable device further changes the behavior of the system. Additionally, systems with multiple reconfiguration ports would lead into more flexible pipelines. The goal of this project is to derive an efficient partitioning of input graphs (i. e., data flow or task graphs) to optimize the execution of the system. Partitions should comprise of nodes that enable an efficient execution of the system. The nodes represent functions that can be placed on the reconfigurable area. Further, vertices comprise area and time needed for execution, and time needed for reconfiguration. The objectives is to fill the partitions with nodes comprising similar execution time and close start times. Further, the reconfiguration time influences the affiliation to a partition. Often, the reconfiguration time dominates the overall execution time. Thus, the goal is to find partitions comprising homogeneous reconfiguration times. Thereby, the overall structure (precedence constraints, etc.) must not be violated. 2. DETAILS OF THE PROJECT A main focus of the work is the partitioning of graphs to match the specific needs of reconfigurable systems. Hall proposed in [1] an approach to place graphs in multi dimensions using the spectral method. The proximity of nodes depends on their closeness parameter. We extend the closeness

717

3. RELATED WORK As related work, we find PipeRench [5], Pipeline Morphing [6], or a concept to reduce the latency [7]. Those and other works explore the benefits of pipelining with respect to reconfiguration time hiding in detail. The authors derive concepts and develop architectures that show how to overcome the long reconfiguration phases most reconfigurable systems suffer from. However, they barely develop a comprehensive model that also focuses on the pipeline capabilities of run time reconfigurable systems during the whole development process. Further, the flexible assignment of dynamically sized area to partitions is often under-represented in their work. 4. CONCLUSION Fig. 2. Extended Y-chart based model for reconfigurable system design.

to a combination of connectivity, execution time or even reconfiguration time. The advantage of the method lies in the preservation of the graph’s original structure. Bobda [2] investigated the spectral method for dynamic reconfigurable devices. His work serves as basis for the project. Targeting reconfigurable devices further enables to dynamically define the execution area for each partition. The size of the calculation area often is linear to the time needed for reconfiguring this area. Using this approximation, we can formalize conditions for execution of graphs relying on the two phase pipeline (RTR and EX). Thus, if permanent reconfiguration is unavoidable, we partition graphs in order to fully utilize the reconfiguration port, achieving a minimal latency. Defining the size of partitions considering the reconfiguration time can help to quickly react on new requirements, i. e., the time of a context switch, which completely blocks its area when reconfigured as a whole, can be best reduced when partially reconfigured with respect to the reconfiguration and execution time of its partitions. Again the spectral method helps to rearrange the input graph to partition towards meeting the aforementioned requirements. For practical validation, a design environment targeting the partial bit-stream generation for Xilinx FPGAs was developed [3]. The environment automates the generation of partial bitstreams in a structured manner. Therefor, the environment is based on mapping the characteristics of dynamic reconfigurable systems on the structure of the Y-chart [4], see Figure 2. A new level (reconfiguration level) was introduced to the Y-chart. Only for clarity reasons, the two lowest levels of the original Y-chart are omitted. The developed design and modeling methodology also servers other kinds of reconfigurable systems.

To summarize, the project focuses on an effective way to partition and execute graphs on reconfigurable devices. By formalizing the reconfiguration process as a phase of a pipelined execution, a framework can be developed. The framework explores scheduling solutions, which take into account constraints like execution area, graph partitioning, reconfiguration port and duration, etc. This framework will allow to develop a comprehensive model for the efficient execution of algorithms on run time reconfigurable systems. 5. REFERENCES [1] K. M. Hall, “An r-dimensional Quadratic Placement Algorithm,” Managment Science, vol. 17, no. 3, pp. 219–229, November 1970. [2] C. Bobda, “Synthesis of dataflow graphs for reconfigurable systems using temporal partitioning and temporal placement,” Ph.D. dissertation, University Paderborn, Heinz Nixdorf Institute, 2003. [3] F. Dittmann, A. Rettberg, and F. Schulte, “A Y-Chart Based Tool for Reconfigurable System Design,” in Workshop on Dynamically Reconfigurable Systems (DRS) 2005, Innsbruck, Austria, 17 Mar. 2005. [4] D. D. Gajski and R. H. Kuhn, “Guest Editor’s Introduction: New VLSI Tools,” IEEE Computer, Dec. 1983. [5] S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. R. Taylor, “PipeRench: A Reconfigurable Architecture and Compiler,” IEEE Computer, vol. 33, pp. 70–77, Apr. 2000. [6] W. Luk, N. Shirazi, S. R. Guo, and P. Y. K. Cheung, “Pipeline Morphing and Virtual Pipelines,” in Field-Programmable Logic and Applications. 7th International Workshop, W. Luk, P. Y. K. Cheung, and M. Glesner, Eds., vol. 1304. London, U.K.: Springer-Verlag, 1997, pp. 111–120. [7] S. Ganesan and R. Vemuri, “An Integrated Temporal Partitioning and Partial Reconfiguration Technique for Design Latency Improvement,” in Proceedings of the IEEE Design, Automation and Test in Europe (DATE ’00), Paris, France, 2000.

718