FPGA-Based Evolvable Hardware Systems Lukáš Sekanina Brno University of Technology Faculty of Information Technology IT4Innovation Centre of Excellence http://www.fit.vutbr.cz/~sekanina
ARCS 2013, Prague February 21, 2013 1
Evolvable Hardware = Evolutionary Computation + Reconfigurable Device
2
Evolvable Hardware Circuit Configured in the device
Fitness Computation
Test Set Introduced Final Circuit
011010000110 110010100011
2,78
2,98
3,56
3,44
0,23
110011100010 011001010010
+
001100101101 A new population is created applying genetic operators
New Circuit Population
Crossover 110010
High-scored configurations are selected 011010000110
100011
Evolution finished?
-
110010100011 110011
100010
110011100010 1
Initial population generator
011001010010
Mutation 00110 101101 0
001100101101
3
Why evolutionary design? • Allows engineers to increase the level of design automation. • The design problem is transformed into the search problem! • Exploring “dark corners” of design spaces. • Multi-objective search.
• Novel designs (unreachable by conventional techniques) can be discovered by means of EA. • antennas, analog circuits, digital circuits, optical lens systems, programs, protocols…
• Adaptive and self-repairing hardware can be implemented using evolvable hardware.
Courtesy of NASA AMES.
4
Human-Competitive Results Competition: The Humies at GECCO since 2004 J. Koza: We say that an automatically created result is “humancompetitive” if it satisfies one or more of the eight criteria below. • • • • • • • •
(A) The result was patented as an invention in the past, is an improvement over a patented invention, or would qualify today as a patentable new invention. (B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peerreviewed scientific journal. (C) The result is equal to or better than a result that was placed into a database or archive of results maintained by an internationally recognized panel of scientific experts. (D) The result is publishable in its own right as a new scientific result independent of the fact that the result was mechanically created. (E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. (F) The result is equal to or better than a result that was considered an achievement in its field at the time it was first discovered. (G) The result solves a problem of indisputable difficulty in its field. (H) The result holds its own or wins a regulated competition involving human contestants (in the form of either live human players or human-written computer programs).
5
Winners of Humies 2004 - 2012 •
2004 • •
•
2005 • •
•
Evolutionary design of the energy function for protein structure prediction (University of Nottingham, UK)
2011 •
•
A Genetic Programming Approach to Automated Software Repair (University of New Mexico, US)
2010 •
•
Genetic Programming for Finite Algebras (Hampshire College, US)
2009 •
•
Evolutionary Design of Single-Mode Microstructured Polymer Optical Fibres (UCL, UK)
2008 •
•
Catalogue of Variable Frequency and Single-Resistance-Controlled Oscillators (MIT, US)
2007 •
•
Two-dimensional photonic crystals designed by evolutionary algorithms (Cornell U. US) Shaped-pulse optimization of coherent soft-x-rays (Colorado State University, US)
2006 •
•
An Evolved Antenna for Deployment on NASA's Space Technology 5 Mission (NASA Ames) Automatic Quantum Computer Programming: A Genetic Programming Approach (Hampshire College, US)
GA-FreeCell: Evolving Solvers for the Game of FreeCell (Ben-Gurion University of the Negev, IL)
2012 •
Evolutionary Game Design (Imperial College London, UK)
6
Intrinsic evolution in XC6216 FPGA
(A. Thompson, 1996)
Task: In FPGA XC6216, find a circuit which gives log. 1 when finput = 10kHz and log. 0 when finput = 1kHz.
The resulting configuration works correctly only for the FPGA where it was evolved and in the environment that existed at the end of evolution. Impossible to fully understand how the evolved circuit works.
7
Outline I. II. III. IV. V.
Case study: Evolutionary design of image filters EHW systems utilizing Virtual Reconfigurable Circuits EHW systems utilizing Dynamic Partial Reconfiguration EHW systems in Zynq-7000 all programmable SoC Conclusions
8
Part I: Case study: Evolutionary design of image filters
9
Evolutionary design of image filters
Can EA design an image filter which exhibits better filtering properties and lower implementation cost w.r.t. conventional solutions? Target domain: filters suppressing shot noise, Gaussian noise, burst noise, edge detectors, … 10
Method: Cartesian Genetic Programming • • •
9
13
17
10 11 12
9 x 8bits
Array of nc x nr prog. elements (PE). Feed-forward networks. All I/O and connections on 8 bits.
•
Encoding of PE22: (17, 19, 5)
•
Complete encoding: nr x nc x 3 + 1 integers
•
Chromosome: (0,1,1), (2,3,5), …, (35,36,5), 22
37 22
19 41
1 x 8bits 11
List of functions in PE
All inputs and outputs at 8 bits! 12
Fitness function 1 R −1 C −1 MAE = I (r , c ) − K (r , c ) ∑∑ RC r =0 c =0
I: golden image K: filtered image R: rows C: columns
GOLDEN IMAGE
REG
-
|ABS|
+
FILTERED IMAGE 13
CGP: Search algorithm ES(1+λ) 1. 2. 3.
Randomly generate 1+λ individuals (the initial population). Evaluate the population. WHILE the termination criterion is not satisfied DO
4.
Select the highest scored individual. Use mutation to create λ offspring of the parent individual. Create a new population using the parent and its λ offspring. Evaluate the population.
Report the individual with the highest fitness.
Typical setup: • • • • •
λ = 1 - 10 Mutation of 1 – 5 genes Array of 4 x 8 elements 30000 generations 100 runs 14
Example of evolved filter behavior a) Image corrupted by 5% salt-and-pepper noise PSNR: 18.43 dB (peak signal to noise ratio)
b) Original image c) Median filter (kernel 3x3) PSNR: 27.92 dB 268 FPGA slices; 305 MHz
d) Evolved filter (kernel 3x3) PSNR: 37.50 dB 200 FPGA slices; 308 MHz
d)
a)
c)
b) 15
Comparison of various filters Mean PSNR for 25 test images
MF – Median Filter AMF – Adaptive Median Filter The single filter is one of evolved filters. Best SW - Y. Dong, S. Xu: A new directional weighted median filter for removal of random-valued impulse noise. Signal Processing Letters. vol. 14, no. 3, p. 193–196, 2007 16
Burst noise filtering • switching image filters • 5 x 5 filtering window
17
Image corrupted by 5% impulse bursts noise.
CWMF
evolved AMF
PWMAD 18
Part II: EHW systems utilizing Virtual Reconfigurable Circuits
… because internal reconfiguration has not been supported at all and dynamic partial reconfiguration has been too complicated in former FPGAs. 19
Implementation strategies HW SW – outside FPGA SW – inside FPGA (soft CPU core) SW – inside FPGA (hard CPU core)
DPR VRC simulator HW SW – outside FPGA SW – inside FPGA (soft CPU core) SW – inside FPGA (hard CPU core) 20
Virtual Reconfigurable Circuit
• • •
VRC is a new MUX-based reconfigurable layer on the top of the FPGA. Fast pipelined reconfiguration and pipelined processing. Configuration register contains 384 bits for the 8x4 processing elements. 21
Fitness calculation Image with noise
Golden image
22
EHW in Xilinx Virtex II Pro FPGA
23
Time of evolution Evaluation of a single candidate circuit (@100 MHz, M x N pixels):
Total evolution time: (Ng generations, Nc VRC instances, p – population size):
24
Speedup and resources utilization • Target platform: Combo6X • CGP • 4 x 8 nodes • training image: 128x128
• Results (FPGA at 100 MHz) • 1 VRC: 44 times faster than a Celeron 2.4GHz CPU • evaluates approx. 6k candidate filters per second • requires approx. 10 sec to produce a filter (~30k generations) • 4 VRCs: The speedup is 170.
Results of synthesis for Virtex II Pro 2VP50FF1517 Nc is the number of VRCs
25
Part III: EHW systems utilizing Dynamic Partial Reconfiguration (DPR)
26
DPR in Virtex-5
Internal Configuration Access Port (ICAP) ~ 100 MHz
Frame - the smallest addressable unit of information of the configuration memory (41 x 32 bits).
Relocation - capability to configure the same reconfigurable module at different positions of the device.
27
Image filter evolution using DPR EMBEDDED µP
RECONFIGURABLE CORE
MEMORY
EVALUATION
MICROBLAZE
EA
IMPROVED
IDEAL RESULT
? INPUT DATA
Reconfiguration Commands CANDIDATE RESULT
PLB BUS
EXTERNAL DDR2 MEMORY
MEMORY CONTROLLER Configuration information
RECONFIGURATION ENGINE
FAST LINK
HWICAP
Dynamic and Partial Reconfiguration
EVOLVABLE FPGA-BASED SoC
Virtex-5 LX110T FPGA included in Digilent’s XUPV5 Evaluation Platform 28
Processing architecture
PEs
Functional Blocks (FB), routing logic is incorporated 9 basic functions and routing possibilities yield 16 PE variants (10 CLBs per PE) One BusMacro for each port of PE.
Chromosome = 98 bits 4 bits per PE (16 times) 4 bits per input MUX (8 times) 2 bits for the output ES (1 + 8)
ID
Function
0 1 2 3 4 5 6 7 8 9
x+y x > 1 255 x >> 1 x max(x,y) min(x,y) x –s y 29
Representation of the filter
Result
Filter latency
Reference image
Image to filter Image to filter
Fitness value
Evolution Filter Fitness
Compare Reference Result
Evolution statistics:
Mutation rate
- number of generations - number of filters evaluated (8 per generation) - number of reconfigured elements
3.5 seconds
Fitness vs time plot (lower is better) 55 seconds
30
Time of evolution: DPR vs. VRC
• • • •
100000 generations, (1+8) ES, 128×128 pixels SW: Intel Core i5 processor @ 3.3 GHz 5056 circuits/s for the RA 6818 circuits/s for the VRC
31
Resources utilization: DPR vs. VRC
• VRC is not as expensive as we thought! • HWICAP can be reused by other applications
32
Part IV: EHW systems in Zynq-7000 all programmable SoC
33
Zynq-7000 all programmable SoC • Processing System (PS) • Dual-core ARM Cortex-A9 (667 MHz – 1 GHz) • Cache (L1: 32 kB + 2 x 32 kB, L2: 512 kB) • On-chip RAM (256 kB) • External memory interfaces (DDR3, DDR2, …) • DMA controller • IO (USB, UART, CAN, … )
PS-centric architecture
• Programmable Logic (PL) • • • • • •
Artix-7 or Kintex-7 LUT (17 600 – 218 000) CLB = 8 x 6-inp. LUT + 16 FF + FF (35 200 – 437 200) BRAM (240 – 2 180 kB) DSP slices, A/D converter, … 34
Zynq-7000 all programmable SoC • Processor configuration access port (PCAP) • a part of the PS • does not need any instantiation in the PL part
• Download throughput: max 400 MB/sec for non-secure PL configuration • Configuration data size: 4 MB for XC7Z020 • Reconfiguration performed by DMA. • The frames are 50 CLB high and 1 CLB wide ~ 15 kB/frame • Frame relocation is possible.
35
Image filter evolution in Zynq: Overview
• EA is a SW running on PS. • Reconfigurable Array • PEs – Programmable elements (pre-synthesized frames) • Config – a register determining the interconnection of PEs by multiplexers (VRC-like style) • pipelined
• FU – fitness unit 36
Image filter evolution in Zynq: Results • •
CGP: 9 inputs / 2 outputs (over 8 bits), 4 rows x 8 columns, ES (1 + 4), 16 functions per PE We measured the number of clock cycles for 100 generations in 10 independent runs.
fvrc = 203 MHz fhyb = 265 MHz
Replacement of 1 PE by PCAP takes 41 us.
37
Is Zynq-7000 good for EHW? • • • •
It is the best currently available platform for EHW! A very fast programmable logic. ADC – a direct analog input from environment is available. Options for EA • a standalone system running on the “second” core (well tuned for a particular application) • one of applications of the OS running on a dual core system
• PCAP is more comfortable than ICAP to perform DPR.
• But, frames still too big => long reconfiguration times!
38
Conclusions
• •
I presented the evolution of FPGA-based implementations of EHW systems. Current trends:
• •
•
The concept of VRC has not been outperformed by DPR yet!
•
•
Complete EHW systems on a chip Genetic operations as a SW for embedded processor.
DPR beats the VRC in the normal operation mode only.
EHW fits the recently introduced concept of underdesigned and opportunistic computing. 39
References Sekanina, L.: Evolvable hardware, Handbook of Natural Computing, Berlin, Springer, 2012, p. 1657-1705 Vašíček, Z., Sekanina, L.: Hardware Accelerator of Cartesian Genetic Programming with Multiple Fitness Units, Computing and Informatics, Vol. 29, No. 6, 2010, p. 1359-1371 Salvador, R., Otero, A., Mora, J., de la Torre, E., Riesgo, T., Sekanina, L.: Self-reconfigurable Evolvable Hardware System for Adaptive Image Processing. IEEE Transactions on Computers, 2013, in press Dobai, R., Sekanina, L.: Image Filter Evolution on the Xilinx Zynq Platform. In Proc. of the NASA/ESA Adaptive Hardware and Systems, 2013, submitted
40
Acknowledgments • Faculty of Information Technology, Brno University of Technology • Roland Dobai, Zdeněk Vašíček, Michal Bidlo
• Center of Industrial Electronics, Universidad Politécnica de Madrid • Ruben Salvador, Eduardo de la Torre, Andrés Otero
41