OPTIMIZATION OF START-UP TIME AND ... - Xun ZHANG

Department of Computer Science ... nologies to freeze the FPGA in a defined state, save this state ..... for next generation ubiquitous computing architecture,”.
68KB taille 1 téléchargements 151 vues
OPTIMIZATION OF START-UP TIME AND QUIESCENT POWER CONSUMPTION OF FPGAS Artur Schiefer, Udo Kebschull Department of Computer Science Leipzig University Augustusplatz 10–11, 04109 Leipzig email: [email protected], [email protected] ABSTRACT

HSTL, LVDS) at the I/O-pins of recent FPGAs make this approach attractive. The available space on FPGAs allows other devices which were usually implemented as dedicated controllers such as Ethernet to be integrated into the FPGA reducing the necessary external controllers to pure ”dumb” transceivers. Our work mainly targets such SoCs based on one or more FPGAs (sometimes also called System On a Programmable Chip(SoPC)). Figure 1 shows a schematic of such a system.

Mass usage of FPGAs instead of ASICs in many embedded applications is often not feasible because of their slow start up time, which is composed of configuration time, time to initialise peripherals and the boot time of the operating system. Another obstacle is their high quiescent power consumption. In this paper we propose a strategy how both problems can be minimized, especially for more complex applications which require a sophisticated operating system and complex peripherals. Our approach uses recent technologies to freeze the FPGA in a defined state, save this state and write it back at start up. With this approach the classical boot process of the OS can be omitted. For many applications there is no need to provide the system with power when it is not actually in use since it can be restarted within milliseconds. Our first results show that a reduction of start up time by up to two orders of magnitude is possible. 1. INTRODUCTION FPGAs (Field Programmable Gate Arrays) have come a long way from simple glue logic to highly sophisticated and complex devices, which have changed the way new hardware is designed forever. They now consist of up to hundreds of thousand programmable units with up to megabytes of internal RAM and other specialized functions such as adders and multipliers. So they became advanced enough to host a modern system on a chip (SoC) composed of a soft -processor (e.g. MircroBlaze, NIOSII, and various OpenCores1 ), several busses for different applications (memory bus, busses for low- or highspeed peripherals, etc.) and the peripherals (communication links, interfaces, etc). The necessary peripherals of such systems can be either on- or off-chip. On chip peripherals is becoming more and more common and because it is easy to adopt since the available libraries with predefined peripherals have grown rapidly. Also the high number of physically supported protocols (e.g. PCI, 1 see

www.opencores.org for further details

0-7803-9362-7/05/$20.00 ©2005 IEEE

551

As mentioned above FPGAs nowadays are used for design purposes and low volume applications, but have not made it into mass production. This is due to several reasons. Our work addresses two of them. The first reason is the compared to ASICs (Application Specific Integrated Circuit) long start up time of FPGAs. We here define start up time as the sum of the time needed to configure the FPGA, the time needed to boot up the operating system, the time to initialize the peripherals and, if necessary, the time to start the user application(s). Classic ASIC applications also suffer from this start up time problem, but they can avoid it by putting the ASIC into special power saving states keeping the system ready to run after a short wake up time. The high quiescent power consumed by FPGAs forbids this approach, especially in mobile and/or wireless applications. So the second reason prohibiting mass use of FPGAs is they require to much quiescent power. We propose a new approach to solve these problems, by effectively combining the advanced abilities of recent FPGAs, such as (partial) reconfiguration and high speed configuration and debug interfaces. In principle our method works the following way: At first boot the FPGA, the OS and initialize in the old-fashioned time consuming way. Then stop the FPGA in a predefined state or before system shutdown. The third step is to save the configuration the FPGA is in to a non volatile memory (Flash-ROM) using its debug interface. This image includes the configuration of the programmable units including the content of the internal memory and the state of the flip-flops. Then the system can be cut from every power source. At next start up the only thing necessary is to write this image

Internal RAM

DMA

Peripherial Bus

Peripherial

Peripherial

Peripherial

A

B

C

MAC

with DMA

LVDS

FPGA

Non-Volatile Configuration Memory

Bus

(Re)-Configuration - Interface

Memory

SoftProcessor

EthernetTransceiver

Fig. 1. Schematic of an SoPC back and the whole system is available again, reducing the start up time to the time necessary to write the image. Since the system is cut off from the power supply during power down no quiescent power is needed at all.

formed during full operation. This implies that the partial reconfiguration has to be completed glitch free (i.e. without affecting other areas on the FPGA). There are a number of possible sources for a bitstream like host systems connected by cable, special configuration ROMs (PROM, Flash-ROM) integrated on the PCB, or even the system running on the FPGA is reconfiguring itself. Modern configuration interfaces are fast enough to configure FPGAs needing a large bitstream in less than 50ms .

2. PRELIMINARIES In this section we will give a brief explanation of terms and technologies used which provide the foundation of our work.

2.2. Readback of FPGA-Configuration 2.1. Configuration of FPGAs

The need to debug the virtual logic implemented with the help of FPGAs has led to the development of techniques to freeze the FPGA at a moment of operation to read its current configuration back. This is becoming an increasingly important feature of FPGAs since the use of technologies that are based on (partial) reconfiguration (e.g. HardwareSchedulers [4]) has spread because in this case not only the state of the memory and the flip-flops changes dynamically during operation. It also allowed the development of other new applications beyond debugging like a Linux file system that accesses the FPGA resources directly [5].

Almost all available FPGAs are based on a SRAM-structure implemented in CMOS-technology. This causes that the information stored in the SRAM-cells becomes invalid when the supply-voltage falls below a certain level, which causes the FPGA to lose its functionality. This is the reason why each FPGA has to be configured at power-on before it can fulfil the task it is intended for. This configuration is accomplished by writing a bitstream to a configuration- or multi-purpose interface (e.g. JTAG boundary scan [1], SelectMAP [2], SignalTap II [3]). This bitstream is generated by a specific tool-chain dependent on vendor and application. All bitstreams have in common that they contain the complete information to configure and initialise all of FPGAs functions. After a so called configuration reset the FPGA is ready to run. In many modern FPGAs this process does not have to necessarily affect the whole FPGA i.e. only a defined part (e.g. a slice) can be configured (partial reconfiguration). Some FPGAs like the Xilinx Virtex II/IV device families even allow the completion of this process to be per-

3. THE FASTBOOT FPGA APPROACH This section describes our proposed method to drastically shorten the start-up-time of a complex FPGA-based SoC as defined in section 1. It describes the tool chain, how to create the bitstream and why this method enables us to forego on a quiescent power supply for many applications.

552

3.1. Tool Chain The way an engineer initially develops a new application for an FPGA-based SoC when one wants to apply our method is very similar to the common design flow. The engineer can use the same tool chain provided by the various FPGAvendors (ISE , EDK , Quartus ) or others (Synopsys , MentorGraphics ) for implementation and debugging of the FPGA functionality. When the development of the hardware-layer of the application has reached a mature state, the next step is to adapt the intended OS to this hardware and to enable the OS to communicate with the peripherals (drivers, modules, etc.). The final step is to develop the application-softwarelayer. Once this initial development process is completed, our method comes into action. We use the debug-interface provided by the FPGA-vendor to fully save the configuration of the SoC on the FPGA i.e. capturing a configuration bitstream (”snapshot”). This bitstream is saved to a nonvolatile memory on the system board or captured by a host system to reuse it with compatible FPGA systems. On startup the bitstream is written back to the FPGA via its configuration interface. 3.2. Bitstream Creation A bitstream that is useable for our purpose has to fulfil a number of requirements. The two most important are consistency and completeness. It has to be consistent, because writing it back has to put the FPGA back into a defined and valid state. The stream has to be complete as well, covering every detail of the FPGA state. The moment and the frequency of bitstream creation are decided by the type of application the system has to perform. We distinguish two types of applications, stateless and stateful ones. Stateless applications do not need to remember any (partial) configuration information (e.g. variables, flip-flop states, RAMcontents) from past executions other than a predefined one by themselves. So the bitstream necessary for them can be created once, and remains static from thereon and is transferable to other compatible systems without changes. Stateful applications need such configuration information to perform their task. In this case at least the parts containing that information have to be rewritten at every shutdown of the application. For stateful systems making use of dynamic reconfiguration using most FPGA space or where logging the place of reconfigurations is inappropriate only a full bitstream readback is practicable. Nonetheless the way of capturing the bitstream is equal for both types of applications. Only two major steps are required to obtain the bitstream. The first step to create a such bitstream is to perform a full start-up sequence of the system, i.e. the FPGA is configured by writing the bitstream (q.v. section 2.1), the configuration reset is performed, the OS is booted, the peripherals are initialised and the application begins to perform its task. As

mentioned above our method relies on the ability of FPGAs to read its configuration completely back. To achieve a consistent bitstream the FPGA has to be halted. After that the stream can be captured safely into the non-volatile memory via the corresponding interfaces. This may require some additional logic on the FPGA enabling it to write the configuration to this memory, but such interfaces usually do not require much of the FPGAs physical resources. 3.3. FastBoot FPGA If a bitstream has been acquired taking into account the precautions pointed out in the section above it is easy to start a complex FPGA-based SoC within some dozen milliseconds. Since everything is implemented on the FPGA writing a consistent and complete bitstream to its configuration interface results in a SoC ready to run. Modern configuration interfaces run at speeds up to 100 Mhz2 or more achieving that even the configuration of very large FPGAs in less than 50ms is possible. Even if the application requires external parts like external RAM or some other fast to initialise peripherals it is obvious that this approach is way faster than the classic start-up behaviour of complex SoCs where we measured dozens of seconds or even minutes. The limiting factor of our method is the size of the bitstream and the speed of the configuration interface. 3.4. Quiescent Power Saving Many applications require a sophisticated OS or are based on complex monolithic software. Both require a long startup time which is often not acceptable because the system has to be operational much earlier than that. The workaround is to start the system to put it into a quiescent state. During this state the system needs to be supplied with energy, the so called quiescent power. This often causes trouble for mobile and/or wireless applications since they have to rely on very limited power resources (e.g. batteries, solar cells) for a long time. Another observation is that the time such applications spend in this quiescent state outstretches the time they are actually active by far. Our approach eliminates the need for this wasteful workaround for many applications because the system in quiescent state needs no power at all and is starting fast enough to fulfil the requirements of the application. 4. Implementation We realized a prototype implementation of our approach using the SUZAKU board [6] shown in figure 2. This development board features a Xilinx Spartan III XC3S400, 16 MB of external RAM, 4 MB Flash-ROM, Ethernet and a serial interface. The implemented SoC is running on the MicroBlaze soft-processor. The OS used is CLinux which is a Linux derivate for MMU-less CPUs. We used the host PC to create the configuration bitstream and stored 2 http://www.xilinx.com/publications/prod

fies 264 Mhz

553

mktg/pn0010489.pdf speci-

It also enables the developer to design new complex applications cheap and easy, where earlier quiescent power consuming and/or slow at start-up ASICs where needed. To prove the potential of this approach we implemented a prototype on the SUZAKU platform. Our initial measurements show that our method is feasible. It shows that in future SoCs with start-up times only limited by the size of the required bitstream and the speed of the configuration interface are possible. Our future work is to explore how FPGAs with integrated physical processor cores like the Xilinx Virtex IIP/IV families (here PowerPC) can also benefit from our method. We are currently evaluating if it is possible by using the debug- and self diagnosis interfaces of the processor and small boot loader which rebuilds the processor state before the whole FPGA starts. We also want to explore at what other areas of applications this approach may be helpful. We want to thank the Advanced Research Meynen GmbH in Starnberg, Germany for their support.

c Fig. 2. SUZAKU Board, Atmark-Techno Inc it back into the Flash-ROM. This possible since less than 4 MB of the external ram is occupied. For next the startup the board was jumpered to configure the FPGA from the Flash-ROM.

6. REFERENCES [1] JTAG Technologies, Documentation of the IEEE 1149.1 boundary-scan standard, www.jtag.com.

4. RESULTS Our initial results show that our method lives up to the promises it made. While the classic start-up sequence of the test system takes 24s until the password entry request appears, it takes about 40ms to accomplish this with our method. This number is based on the oscillator driving the configuration interface on our system. For the user the system seems like instant on. 5. CONCLUSION

[2] Xilinx Inc., XAPP502: SelectMAP Mode v1.4, http://direct.xilinx.com/bvdocs/appnotes/xapp502.pdf. [3] Altera Corporation, Design Debugging Using the SignalTap II Embedded Logic Analyzer, http://www.altera.com/literature/hb/qts/qts qii53009.pdf. [4] C. Nitsch and U. Kebschull, “A novel design technology for next generation ubiquitous computing architecture,” in IPDPS, 2003. [5] A. Donlin, P. Lysaght, B. Blodget, and G. Troeger, “A virtual file system for dynamically reconfigurable fpgas,” in Field Programmable Logic, 2004.

We developed a new method to speed-up the start-up process of complex FPGA-based SoCs. The time to start or wake-up our system from quiescent state is less than 50ms. Therefore this method can easily outperform the classic start-up sequences necessary for complex ASIC-based designs. So the first step for a whole new area of FPGA-based applications needing fast start-up time and no quiescent power consumption.

[6] Atmark Techno Inc., SUZAKU FPGA+Linux The New Generation of Embedded Solutions, http://www.atmarktechno.com/en/product/suzaku.html.

554