Cluster-Based Hybrid Reconfigurable Architecture for ... - Xun ZHANG

A platform based on a Xilinx Virtex-4 FPGA is used for experimental implementation. I. INTRODUCTION. In the multimedia applications, customers demand more.
51KB taille 8 téléchargements 330 vues
Cluster-Based Hybrid Reconfigurable Architecture for Auto-adaptive SoC Xun ZHANG, Hassan RABAH Serge WEBER Laboratoire d’Instrumentation Electronique de nancy Nancy University Vandoeuvre-les-nancy, Nancy, 54500 email: {xun.zhang, hassan.rabah, serge.weber}@lien.uhp-nancy.fr

Abstract— The paper presents a cluster-based hybrid reconfigurable architecture for auto-adaptive SoC to achieve highperformance and flexibility with low design effort on a variety of multimedia applications. An efficient adaptivity is enabled thanks to the use of heterogenous and exchangeable cores and to a hierarchical organization. This organization is materialized trough a global hardware reconfiguration and local hardware reconfiguration by using partial and dynamic reconfiguration. A case study of a discrete wavelet transform is used to demonstrate the feasibility in task adaptive level considering different types of filters. A platform based on a Xilinx Virtex-4 FPGA is used for experimental implementation.

I. I NTRODUCTION In the multimedia applications, customers demand more functionality and better audio-visual quality. At the same time, competitive pressures make achieving faster time-to market essential. Moreover, the diversity of communication networks, the bandwidth availability, energy constraints and the evolution of encoding standards require different types of encoding and decoding systems. All this requirements, among others, give the adaptively concept, which is not new, a crucial importance in present and future electronic devices. Making a system autoadaptive to requirements of a given application or a developing application in the future by adapting the hardware is the efficient solution to fit the computation needs in multimedia domain. This auto-adaptability can be achieved efficiently by using an heterogenous reprogrammable and reconfigurable structure. However, it is well known that reconfiguration overhead drastically affects both the system performance and energy consumption [1]. Different approaches have been proposed in order to cope with this problem. Among these researches, scheduling algorithms are used to minimize the reconfiguration overhead in partially reconfigurable hardware by hiding reconfiguration latency [2] [3]. In this case, a particular effort must be done in the design of scheduler and reconfiguration manager. Reconfiguration overhead can also be reduced using a multi-context technology, as used in coarse grained reconfigurable circuits, to the detriment of flexibility and huge among of memory requirement [4]. A concept of hyper-configurable architecture has been introduced as an alternative [5]. In this concept, a resource allowing reconfiguration is reconfigurable itself by defining different levels of reconfiguration. The drawbacks of this method are the reconfiguration memory

requirement, the complex control circuitry and the use of specific target architecture. The rapid evolution of reconfigurable architecture, particularly the modern FPGAs which can integrate a complete system on chip, requires new architectural design and methods to exploits their potential. These architectures must take into account the needs of an application, or a set of applications of a domain, in term of efficiency and adaptability. It must also be capable of exploiting the available heterogeneous resources and partial reconfiguration potential of the target technology. To meet these requirements, we propose a cluster-based hybrid reconfigurable and programmable architecture. Each cluster is composed of interchangeable reconfigurable cores and programmable processors. In our approach, the architecture is a hierarchical structure with two levels of reconfigurations. The first level allows the application swapping and the second level allows the adaptation of an application to its enthronement. In order to demonstrate the feasibility of our architecture, we choose a video decoder as an application and we focus on task adaptive level where we use the wavelet transform [6] as an adaptable task. The inherent scalability of wavelet transform and its use in new compression standards make it as a good candidate and motivate our choice. Moreover, the wavelet transform is achievable using different types of algorithms and different types of filters. The remainder of paper is organized in the follow: in section II we explain the approach of layered adaptivity, the proposed layered and reconfigurable architecture is detailed in section III our approach is validated through the case study in section V. section VI will give the concluding remarks and the future work. II. L EVELS OF

AUTO - ADAPTATION

Adaptation is a ability of SoC to adapt the external requirement during run-time by adjusting it’s structure. In our approach, the adaptation can be seen in two manners: the application adaptive level and task adaptive level. The application adaptive level represents the switching between different applications. For example, the multimedia terminal switches it use from playing a movie to answering a video call. The task adaptive level consists of the switching different versions of a task of an application, this situation can occur for instance in down scaling or up scaling in video decoding according to the available bandwidth.

A. Application adaptive level RPM

Memory

Memory

RPM Memory

RPM

RPM

RPM

Memory

reconfigurable communication

RISC RTOS

reconfigurable communication

For a given domain, applications can be described by a set of processing tasks and sub tasks. The difference between the applications could be represented with common processing tasks and specific processing tasks. Figure ?? shows an example of two applications A1 and A2 featuring common tasks (continuous lines) and specific tasks (dash lines). Switching from application A1 to application A2 requires replacement of specific tasks and the communication between newly loaded tasks and common tasks. In some cases, the simultaneous execution of two applications is required. To achieve this, different versions of specific tasks must be available.

RPM

RPM

reconfigurable communication

Fig. 1.

Layered architecture

B. Task adaptive level Each task of an application commonly consists of a set of sub-tasks or a set of operators depending on the complexity of task as shown in figure ?? where a new version of task T 2 is used to adapt the application A2 to a given environment. To enable task adaptivity, different versions of a task for a given algorithm must be defined and characterized in terms of power, area, throughput, efficiency and other objectives. For the same task, it must be also possible to change the type of algorithm in order to adapt the application to the future standards. III. C LUSTER - BASED H YBRID

RECONFIGURABLE

ARCHITECTURE

Different design strategies by which a program may be embedded in a reconfigurable system on chip were reported in literature. This strategies can range from a pure software implementation to a pure hardware implementation with various intermediate solutions mixing hardware and software in a tightly or loosely coupling. Each model exploits a different part of the cost performance spectrum of implementations and is well suited for a specific application or a specific task of an application. The maximum flexibility is obtained by a pure software implementation and the maximum performance is obtained by a hardware implementation. The performances of a processor can be enhanced by modifying its data-path. The data-path can be made reconfigurable in order to enhance its flexibility. However, this type of model remains specific to limited applications. When the number of processing elements is very important, the communication becomes a real problem. This problem is taken in consideration in our analysis, which is not addressed in this paper, and our choice is an heterogenous association of local parallel bus and global serial or semi-parallel packetized link. These choices lead us to organize our system in a cluster fashion where a cluster is composed of a set of modules communicating via a bus. The cluster intercommunication is achieved by using a serial or semi-parallel packetized link. In this background, the proposed architecture with hierarchical reconfiguration structure which is based on the dual-level adaptation defined above is shown in figure 1. This hierarchical structure is configurable in two ways: a global reconfiguration level and a local reconfiguration level.

A. Global reconfiguration level In the global reconfiguration level, it is possible to reconfigure the communication between clusters and elements of a cluster in order to meet a particular need. The proposed organization is depicted in figure 1. It is composed of an heterogeneous multiprocessor cores that allow software reuse, one or several Reconfigurable Processing Modules (RPM), a reconfigurable interface, and an on chip memory. The reconfigurable processing modules allows hardware acceleration and can be reconfigured in a way that supports different versions of a task. The reconfigurable communication interface is used to build the interconnection between RPM and the other components. Each RPM can be reconfigured at runtime. A reconfiguration manager controls the sequences of reconfigurations(see Section IV). When a new application is required, the configuration of RPM corresponding to the application will be loaded as well as those of the adequate communication. B. Local reconfiguration level The task adaptive level is enabled by reconfiguration at processing element level, where versions of a task can be mapped into software or hardware. The software version can be executed on a general purpose embedded core processor or a specific embedded core processor. The hardware versions can be mapped on a Reconfigurable Processing Module. Three types of RPM reconfigurations are defined : • Small reconfiguration : Different parts of the RPM can be reconfigured individually. The structure of such an RPM is depicted in figure 2 When a task is mapped on this type of RPM, the intra-task is allowed by performing small changes. • Medium reconfiguration : In this type of RPM a tiny processor is associated to a medium reconfigurable area (figure 3) that can bring flexibility and performance at once. • Overall reconfiguration : This type of RPM can be reconfigured to hold a more advanced soft CPU or a specific hardware IP core. the different types of RPM are designed in order to allow the best flexibility and performances tradeoffs at run time when