Moza¨ıc: Generic Framework for the Design of ... - Julien Lallet

reconfigurable architectures (DRA). 1. Support various architecture models ⇒ ADL. 2. Design space exploration. 3. Automatic generation of the DRA. Domain 1.
584KB taille 2 téléchargements 43 vues
Moza¨ıc: Generic Framework for the Design of Dynamically Reconfigurable Architectures Julien LALLET IRISA/University of Rennes email: [email protected]

I - Abstract $

'

Objective Define a framework for the design of dynamically reconfigurable architectures (DRA)

A - Architecture Description Language

1. Support various architecture models ⇒ ADL

The proposed ADL xMAML supports various types of:

IO

Application flow

1. Dynamically reconfigurable processing units instantiation (e.g. LB, DPR...)

2. Design space exploration 3. Automatic generation of the DRA Reconfiguration Controller

FPGA

DyRIOBloc

DyRIOBloc

DyRIOBloc

DUCK

DUCK

DUCK

PU

PU

PU

DUCK

DUCK

DUCK

3. Dynamic reconfiguration mechanisms

Synthesis Compilation

B - Hardware generation for dynamic reconfiguration support

Place/Route

DyRIBox

DUCK

DUCK

Architecture generator

Domain 3

Domain p

LB

LB

LB

IO

LB

LB

LB

LB

LB

LB

IO

IO

LB

LB

LB

LB

LB

LB

IO

IO

IO

IO

IO

IO

IO

IO

IO

DPR DPR DPR

DPR DPR DPR

DPR DPR DPR

DPR DPR DPR

IO

Bitstream (Binary file)

Architecture (VHDL)

IO

IO

IO DPR DPR DPR

DPR DPR DPR IO

(a) Heterogeneous reconfiguration process (b) Fast, partial and efficient reconfiguration (c) Multi-context and scanpath based

Domain 2

LB

IO

2. Reconfiguration controller which manages:

Domain 1

IO LB

Component Library (IP)

Bitstream generator

1. Reconfiguration wrapper for each implemented unit (DUCK) DyRIBox

Architecture modeling (xMAML)

IO

LB

Contraints

2. Interconnection schemes (e.g. mesh, hierarchical...)

IO

IO

Architecture flow (Mozaic)

Application

IO

IO

DPR DPR DPR

DPR DPR DPR IO

IO

IO

DART Architecture and application design flow

Architecture template

&

%

II - The Moza¨ıc Framework '

$

A - Dynamic Reconfiguration Process and Resources a- Three step reconfiguration ⇀ Y) Step I: the system is running (resources configured in a state X) Step II: the system is running (next configuration Y is propagated) Step III: the system reconfigures (the two configurations are swapped X ↽ b- Three kinds of DUCK (Dynamically Unifier and reConfiguration blocK) wrappers ScanEn

DataIn

WE

Input B

A ScanIn

DFF

ScanIn

DFF cpt

OR

B

A

DFF

ScanEn

DFF

C

DFF

DFF

DFF DFF

D

DFF

16 to 1

1

0

DFF

DFF

DFF

...

1

Ctrl DFF

Arithmetic ADD, SOUS, ABS...

DFF

output

SIMD

Logical ET, OU, ...

DFF

0

DFF

n

DFF

DFF

scanIn

Inputs

cpt

SHIFTER

ScanIn 10

confEn

Input Shifter

1 to 16 ...

DFF

Demux

cout

carry

Input A

...

...

...

...

...

...

SATURATION

...

C

DFF

Tiny reconfiguration Ctrl

ScanEn confEn

input0 input1 input2 cin input3

confEn

Data

Synchron Out DFF

RAM Address 2n x 1

Asynchron Out

R/W

ScanOut

Mux DFF

DFF

DFF

DFF DFF

DFF

Output Shifter

DFF

E

B

SHIFTER

C

ScanOut

A

B

ScanOut

Reconfiguration by Address

Output

Serial Reconfiguration

Parallel Reconfiguration

B - xMAML: High level specification for dynamically reconfigurable architectures a - xMAML DyRIBox specification (interconnection ressources)

d - xMAML reconfiguration domain specification

b - xMAML DyRIOBloc specification (IO ports)

1 < DBDomain name =" DPR_1 " > 2 < R e c o n f i g u r at ion Par ame ter s preemption =" disable " domainCtrl =" shared" p a r t i a l R e co nfi gura tio n =" enable " I R P r i o rityLevel ="3" taskNumber ="10" confBusWidth ="8"/ > 3 < Interconnect type =" manual" > 4 < Instantiation name =" MultBus " instanceOf =" DBox "/ > 5 < InternalConnections > 6 < MultBus : in (1) = DataMem1 : output_0 (0:15)/ > 7 < MultBus : in (2) = DataMem2 : output_0 (0:15)/ > 8 ... 9 < AG4 : output_0 (0:15) = DataMem4 : input_0 (0:15) / > 10 11 12 < ElementsPolytopeRange > 13 < MatrixA row = " 1 0"/ > 14 < MatrixA row = " 1 0"/ > 15 < MatrixA row = " 0 1 "/ > 16 < MatrixA row = " 0 1 "/ > 17 < VectorB value = " 0"/ > 18 < VectorB value = " 8"/ > 19 < VectorB value = " 0 "/ > 20 < VectorB value = " 4"/ > 21 22

1 < DyRIOBloc name =" IO " > 2 < R e c o n f i guration cycle ="1"/ > 3 < Port name =" i1 " bitwidth ="1" direction =" in " buffer ="4"/ > 4 < Port name =" io1 " bitwidth ="1" direction =" inout " buffer ="4"/ > 5 < Port name =" o1 " bitwidth ="1" direction =" out " buffer ="4"/ > 6

1 < P E I n t e r c o nnec tDyR IBox name =" DB " > 2 < R e c o n f i gur ation Time cycle ="1"/ > 3 < DBPorts > 4 < Inputs number ="4" bitwidth ="8" / > 5 < Outputs number ="4" bitwidth ="8" / > 6 7 < PElementsPorts > 8 < Inputs number ="1" bitwidth ="8" / > 9 < Outputs number ="1" bitwidth ="8" / > 10 11 < AdjacencyMatrix > 12 < DOutput idx ="0" row ="01111" / > 13 < DOutput idx ="1" row ="10111" / > 14 < DOutput idx ="2" row ="11011" / > 15 < DOutput idx ="3" row ="11101" / > 16 < POutput idx ="0" row ="11110" / > 17 18

c - xMAML DUCK specification (reconfiguration wrappers) 1 < PEInterface name =" clb " > 2 < R e c o n f iguratio n cycle ="16" bits ="16" preemption =" no "/ > 3 < IOPorts > 4 < Port name =" luti0 " bitwidth ="1" direction =" in " type =" data " / > 5 < Port name =" luti1 " bitwidth ="1" direction =" in " type =" data " / > 6 ... 7 < Port name =" ConfigIn " bitwidth ="1" direction =" in " type =" RAMConfIn "/ > 8 < Port name =" RW " bitwidth ="1" direction =" in " type =" RAMConfEn "/ > 9 < Port name =" ConfAddr " bitwidth ="4" direction =" in " type =" RAMConfAddr "/ > 10 11

&

%

III - Case Study: a dynamically reconfigurable implementation of WCDMA on an eFPGA '

τ1

D.C.

Sr (n)

A

Se (n)

FIR

N

Searcher

τL

Rake Receiver

Slot n + 1 reception

Slot n + 2 reception

Slot n computing

Slot n + 1 computing

Logic synthesis, place & route on the eFPGA: (ABC Berkeley and VPR)

ˆb

Frequency Converter

Configuration

CAG

r

FIR

r

r

Searcher

Rake receiver : 6 fingers +symbole estimation

r

r

FIR

Task

Searcher

WCDMA Receiver

Time

tslot

Configuration Memory ComputingMemory

Configuration Memory ComputingMemory

Configuration Memory

FIR ConfS

T1

ComputingMemory FIR filter configuration

Domain 1 247 logic cells

Domain 2

FIR ConfS

247 logic cells

247 logic cells

ComputingMemory

ComputingMemory

Configuration Memory &

Configuration Memory

T2

FIR ConfS

Searcher configuration

Domain 4

247 logic cells

NOP

r

Searcher ConfR

r

Searcher ConfR

NOP

NOP

r

Current computing DUCK configuration Total=1235 logic cells

T 3a

T 3b

eFPGA

T 3c

T 3d

FIR ConfS

NOP

NOP

NOP

r

Searcher ConfR

r

Searcher ConfR

r

Searcher ConfR

NOP

NOP

NOP

NOP

Symbole

ConfF

NOP

Finger 4 NOP ConfF

NOP

r

Finger 3 NOP ConfF

NOP

r

Finger 2 NOP ConfF

NOP

r

Finger 1 NOP ConfF

NOP

r

Domain 5

247 logic cells Domain 3

FIR ConfS

NOP

CLB r Domain 5

tcomputing

tr

tpropagation

tslot

Rake Receiver configuration

Rake Receiver FIR Searcher finger symbol 245 50 1117 1235 Total: 1030 3382 CLB

r Domain 4 r Domain 3 r Domain 2 r Domain 1 Time

T4

Total

$

Architecture synthesis of the eFPGA: (Synopsys CMOS 130nm) • Static Implementation: (44 mm2)

3382 CLBs and switch blocks

• Dynamic Implementation: 1235 CLBs, switch blocks and reconfiguration control (26 mm2) Impact of dynamic reconfiguration on area: 40% silicon area saved %

IRISA/University of Rennes 6 rue de kerampont BP 80518 F-22305 Lannion Cedex