Hardware solutions for radar processing FPGA / GPU / CPU Trade-offs J.F. Degurse – Y. Mourot – H.Cantalloube – L. Savy
Sondra Wokshop in Singapore March 21th 2012
1
Hardware components for radar processing
FPGA 2
GPU
CPU
Characteristics of candidate embedded products AMD Phen. II 1090T CPU
AMD Radeon E6760 GPU
NVIDIA GeForce GT240 GPU (GRA 111)
Xilinx Virtex-6 SW475T FPGA
SP GFLOPs
153.6
576
250
550
GFLOPS / W
1.23
16.5
7.1
13.7
GFLOPS / $
0.54
N/A
2.7
0.14
CPU 3
GPU
FPGA
Why use FPGAs in radar processing? Nowadays ADCs can sample signals at rates up to a few GigaSamples per second signal sampling on carrier frequency or at the first intermediate frequency in the radar receiver
Often two configurations are incoutered on modern versatile radars (different modes at different times) 1.Wideband + a few channels (1 to 4) 2.Narrowband + High dynamic + numerous channels (10 to 100 or even 1000 for the full digital radar)
Need to reduce the data rate Real time digital beamforming Real time filtering and decimation 4
FPGA
Digital Receiver
Synoptic of a digital receiver (One channel)
Analog Signal
ADC
16 bits 16 bits
FPGA
5
Benefits of Digital Receiver
Benefits Cost effective compared to analog solution I&Q are well balanced, as they are digitally filtered Highly reconfigurable
BUT Requires highly skilled people to program FPGAs
6
Why GPU for radar processing instead of CPU ? GFlops (Double precision) 20 000
(Double precision)
1000 T20 448
12
Kepler C3070
630 515 77
27
cm
cm
Tesla C2075 Potential benefits
Best ratios Gflops/W, Glops/$ 500 Gflops for 200 W on Telsa C2075 Cost (~ 2 K$) No ITAR issue (mass market) 100% costs and open source Scientific library available(CuBLAS for linear algebra) Existing programming environnement (CUDA)
Open questions How to formalize radar processing for GPU efficient implementation? What effective computing power for which radar processing?
• High performance on massively parallel algorithms • High level parallelization tasks e.g GPU : NVIDIA Tesla C2075 500 Gflops (double precision) 2.5 GFlops per Watt
Application case 1 at Onera : space surveillance ground radar Current supercomputer (100+ PowerPC G4 processors) is being replaced by a single GPU card
Mercury SC Price = 1 M$ (2003) TDP > 8 kW 9
Tesla C2075 Price = 2.5 K$ (2011) TDP = 215 W
Application case 1 at Onera : space surveillance ground radar
Performance Speedup GPU/CPU = x7.2 * All 8 cores active 10
Application case 1 at Onera : space surveillance ground radar
Digital beamforming + detection 1200 beams
Performance Speedup GPU/CPU = x16.5 * All 8 cores active 11
Application case 2 at Onera : STAP on Airborne Radar Rugged GPGPU product available for defense and aerospace market MAGIC 1
GE-IP’s rugged GRA111 NVIDIA GeForce GT240 GPU
CPU only = 16 GFlops peak, 60 W, 0.27 GFlops/W CPU+GPU = 250 GFlops, 100 W, 2.5 GFlops/W Courtesy of GE Intelligent Plateforms
12
Application case 2 at Onera : STAP on Airborne Radar Reference Processing Matrix inversion
w H = ( wqH Γth wq ) ⋅
N spatial channels
STAP Data Cube
M taps
τ * 1
w
w
τ
N channels
STAP Transversal filter
τ
w
τ w
w
* K ( M −1) +1
τ
…
* K +2
* 2
τ
…
* K +1
w
* K ( M −1) +2
K
STAP STAP Transversal Transversal filter filter
wqH Γth Γ −1Γth wq
τ * K
w
τ * K +K
w
…
M
K range gates
Mp pulses
wqH Γth Γ −1
τ
… * KM
w
FFT STAP channel
wqH = [1 1...1 000...0000] N times (N-1)M times
13
+
Application case 2 at Onera : STAP on Airborne Radar Matrix inversion
Double precision matrix inversion
1000 matrix of size 60x60
Performance Speedup GPU/CPU = x6.0 * All 4 cores active 14
** DP Matrix inversion
Application case 3 at Onera : SAR Processing
Processing time for one SAR image Stripmap mode Tacq=2’ Image size RangexAzimut =10 000 x 80 000 pixels (+1*Intel X5650)
To reach Tesla C2050’s performances, 11 CPUs are needed ! 15
Application case 3 at Onera : SAR Processing
Comparaison GPU/CPU at same performances : Price(€)
Electricity consumption (W) 12000
1200
10000
1000
8000
800
6000
600 4000
400
2000
200
0
0 (+1*Intel X5650)
Nvidia C2050 (+1*Intel X5650)
11*X5650
Nvidia C2050 11*X5650 (+1*Intel X5650)
Reaching the same performances with CPU will cost and consume 5 time more ! 16
Hardware solutions for radar processing Summary
FPGA are the only solution to face high data rate Broadband, data rate reduction (DDC), performances D igi R ec tal Highly complex programming, hard to modify eiv er Cost
CPU does efficiently sequential work (and controls the GPU) Easy programming, portability, easy algorithms evolution Fast on iterative / non easy parallelizable operations (CFAR,…) Performances , electric consumption, cost
GPU performs very well on huge parallel tasks Performances on parallel processing, easy algorithms evolution Efficiency, small size Data transfers, iterative operations 17
B20. B21. C8. C17. D1. D5. D8. D14. D18. D21. E3. E6. E7. E8. E15. E16. E19 ...... LP. B. S. R. N. P. V. C. C. R. 4. P. V. C. C. R. 3. R. O. U. TN. 2. R. O. U. TN. 1. P.
Mar 30, 2006 - 4.b Scheduling Algorithms. Scheduling in batch systems. ⢠Shortest Job First (SJF). â example: a) turnaround times T r. = 8, 12, 16, 20 â mean ...
Mar 28, 2006 - Transition diagram of a seven-state model program is on disk ... power-law: large # of short CPU bursts, small # of large bursts. 3/28/2006 ... cooperative scheduling â let a process run until it blocks on I/O ... nuclear plant).
Much of the early multiprocessor scheduling research focused on scheduling ... systems dedicated to the particular application; see Gonzalez [Gon77] for a survey. .... some architectures, where certain operations can be optimized if it is known ...
opana 40 mg vs oxycodone 30mg for sale oxycodone 30mg in ... drug test opiates oxycodone acetaminophen ... conversion of oxycodone to morphine calculator.
APEL: A CPU Accounting. Infrastructure for Grids. Presented by Will Rogers. Ming Jiang, Cristina Del Cano Novales, Gilles Mathieu,. John Casson, Will Rogers, ...
Additionally, we demonstrate that the presence of a competing species can considerably ..... resource allocated to competitive ability also increases, especially ...
Jul 28, 2004 - evidence for tradeoffs that can influence evolutionary ... ing can change genetic correlations in unpredictable and ..... Genetics 160: 683â696.
clamped as well as free portions of the stem were examined by eye. No evidence of deformation nor crushing of the clamped stem portion was observed. In each ...
and xylem structural properties showed that increased resistance to .... on a loamy sand soil (pH = 5.9) from 0.25-m-long hardwood. 1554 R. Fichot et al.
of process arrival & burst times is stored, and (b) the chosen algorithm. ... 15 (end) - 2 (arrival) = 13 and its waiting time is 13 (turnaround) - 6 (burst) = 7.
Feb 17, 2002 - It calculates the greatest common divisor of two num- bers using ... The state encoding for the state machine is listed in table 2. Almost all ...
2.19.1 Package Contents. ..... 4.3.2 Cooling Kit CF-519 Installation. ...... setting made available to Windows 2000 and Windows XP systems that increases the ..... to boot your system from a server using the LAN boot ROM functionality, you ..... If y
the gap between the hard drive and logic board was too narrow to free the speaker cable earlier, gently pull it free as you remove the speaker.. Push from the ...