CPU Trade-offs

Mar 21, 2012 - Nowadays ADCs can sample signals at rates up to a ... (CUDA).. Open questions. How to formalize radar processing for. GPU efficient ...

Télécharger le PDF

3MB taille 12 téléchargements 352 vues

commentaire

Report

Hardware solutions for radar processing FPGA / GPU / CPU Trade-offs J.F. Degurse – Y. Mourot – H.Cantalloube – L. Savy

Sondra Wokshop in Singapore March 21th 2012

1

Hardware components for radar processing

FPGA 2

GPU

CPU

Characteristics of candidate embedded products AMD Phen. II 1090T CPU

AMD Radeon E6760 GPU

NVIDIA GeForce GT240 GPU (GRA 111)

Xilinx Virtex-6 SW475T FPGA

SP GFLOPs

153.6

576

250

550

GFLOPS / W

1.23

16.5

7.1

13.7

GFLOPS / $

0.54

N/A

2.7

0.14

CPU 3

GPU

FPGA

Why use FPGAs in radar processing?  Nowadays ADCs can sample signals at rates up to a few GigaSamples per second  signal sampling on carrier frequency or at the first intermediate frequency in the radar receiver

 Often two configurations are incoutered on modern versatile radars (different modes at different times) 1.Wideband + a few channels (1 to 4) 2.Narrowband + High dynamic + numerous channels (10 to 100 or even 1000 for the full digital radar)

 Need to reduce the data rate Real time digital beamforming Real time filtering and decimation 4

FPGA

Digital Receiver

Synoptic of a digital receiver (One channel)

Analog Signal

ADC

16 bits 16 bits

FPGA

5

Benefits of Digital Receiver

 Benefits  Cost effective compared to analog solution  I&Q are well balanced, as they are digitally filtered  Highly reconfigurable

 BUT  Requires highly skilled people to program FPGAs

6

Why GPU for radar processing instead of CPU ? GFlops (Double precision) 20 000

(Double precision)

1000 T20 448

12

Kepler C3070

630 515 77



27

cm

cm

Tesla C2075 Potential benefits

Best ratios Gflops/W, Glops/$ 500 Gflops for 200 W on Telsa C2075 Cost (~ 2 K$) No ITAR issue (mass market) 100% costs and open source Scientific library available(CuBLAS for linear algebra)  Existing programming environnement (CUDA)      



Open questions  How to formalize radar processing for GPU efficient implementation?  What effective computing power for which radar processing?

CPU / GPU Comparison

~ Mo

• Fast on iterative calculations

• Low level parallelization tasks e.g CPU : Intel X5670 (6 cores) 70 Gflops (double precision) 0.7 GFlops per Watt 8

• High performance on massively parallel algorithms • High level parallelization tasks e.g GPU : NVIDIA Tesla C2075 500 Gflops (double precision) 2.5 GFlops per Watt

Application case 1 at Onera : space surveillance ground radar Current supercomputer (100+ PowerPC G4 processors) is being replaced by a single GPU card

Mercury SC Price = 1 M$ (2003) TDP > 8 kW 9

Tesla C2075 Price = 2.5 K$ (2011) TDP = 215 W

Application case 1 at Onera : space surveillance ground radar

Pulse compression + Doppler Processing 16384 Doppler filters 100 receivers 20 Doppler rate  1.3Gflops

Performance Speedup GPU/CPU = x7.2 * All 8 cores active 10

Application case 1 at Onera : space surveillance ground radar

Digital beamforming + detection 1200 beams

Performance Speedup GPU/CPU = x16.5 * All 8 cores active 11

Application case 2 at Onera : STAP on Airborne Radar Rugged GPGPU product available for defense and aerospace market MAGIC 1

GE-IP’s rugged GRA111 NVIDIA GeForce GT240 GPU

CPU only = 16 GFlops peak, 60 W, 0.27 GFlops/W CPU+GPU = 250 GFlops, 100 W, 2.5 GFlops/W Courtesy of GE Intelligent Plateforms

12

Application case 2 at Onera : STAP on Airborne Radar Reference Processing Matrix inversion

w H = ( wqH Γth wq ) ⋅

N spatial channels

STAP Data Cube

M taps

τ * 1

w

w

τ

N channels

STAP Transversal filter

τ

w

τ w

w

* K ( M −1) +1

τ

…

* K +2

* 2

τ

…

* K +1

w

* K ( M −1) +2

K

STAP STAP Transversal Transversal filter filter

wqH Γth Γ −1Γth wq

τ * K

w

τ * K +K

w

…

M

K range gates

Mp pulses

wqH Γth Γ −1

τ

… * KM

w

FFT STAP channel

wqH = [1 1...1 000...0000] N times (N-1)M times

13

+

Application case 2 at Onera : STAP on Airborne Radar Matrix inversion

Double precision matrix inversion

1000 matrix of size 60x60

Performance Speedup GPU/CPU = x6.0 * All 4 cores active 14

** DP Matrix inversion

Application case 3 at Onera : SAR Processing

Processing time for one SAR image Stripmap mode Tacq=2’ Image size RangexAzimut =10 000 x 80 000 pixels (+1*Intel X5650)

To reach Tesla C2050’s performances, 11 CPUs are needed ! 15

Application case 3 at Onera : SAR Processing

Comparaison GPU/CPU at same performances : Price(€)

Electricity consumption (W) 12000

1200

10000

1000

8000

800

6000

600 4000

400

2000

200

0

0 (+1*Intel X5650)

Nvidia C2050 (+1*Intel X5650)

11*X5650

Nvidia C2050 11*X5650 (+1*Intel X5650)

Reaching the same performances with CPU will cost and consume 5 time more ! 16

Hardware solutions for radar processing Summary

 FPGA are the only solution to face high data rate  Broadband, data rate reduction (DDC), performances D igi R ec tal  Highly complex programming, hard to modify eiv er  Cost

 CPU does efficiently sequential work (and controls the GPU)  Easy programming, portability, easy algorithms evolution  Fast on iterative / non easy parallelizable operations (CFAR,…)  Performances , electric consumption, cost

 GPU performs very well on huge parallel tasks  Performances on parallel processing, easy algorithms evolution  Efficiency, small size  Data transfers, iterative operations 17

CPU Trade-offs

des documents recommandant