Moza¨ıc: Generic Framework for the Design of Dynamically Reconfigurable Architectures Julien LALLET IRISA/University of Rennes email:
[email protected]
I - Abstract $
'
Objective Define a framework for the design of dynamically reconfigurable architectures (DRA)
A - Architecture Description Language
1. Support various architecture models ⇒ ADL
The proposed ADL xMAML supports various types of:
IO
Application flow
1. Dynamically reconfigurable processing units instantiation (e.g. LB, DPR...)
2. Design space exploration 3. Automatic generation of the DRA Reconfiguration Controller
FPGA
DyRIOBloc
DyRIOBloc
DyRIOBloc
DUCK
DUCK
DUCK
PU
PU
PU
DUCK
DUCK
DUCK
3. Dynamic reconfiguration mechanisms
Synthesis Compilation
B - Hardware generation for dynamic reconfiguration support
Place/Route
DyRIBox
DUCK
DUCK
Architecture generator
Domain 3
Domain p
LB
LB
LB
IO
LB
LB
LB
LB
LB
LB
IO
IO
LB
LB
LB
LB
LB
LB
IO
IO
IO
IO
IO
IO
IO
IO
IO
DPR DPR DPR
DPR DPR DPR
DPR DPR DPR
DPR DPR DPR
IO
Bitstream (Binary file)
Architecture (VHDL)
IO
IO
IO DPR DPR DPR
DPR DPR DPR IO
(a) Heterogeneous reconfiguration process (b) Fast, partial and efficient reconfiguration (c) Multi-context and scanpath based
Domain 2
LB
IO
2. Reconfiguration controller which manages:
Domain 1
IO LB
Component Library (IP)
Bitstream generator
1. Reconfiguration wrapper for each implemented unit (DUCK) DyRIBox
Architecture modeling (xMAML)
IO
LB
Contraints
2. Interconnection schemes (e.g. mesh, hierarchical...)
IO
IO
Architecture flow (Mozaic)
Application
IO
IO
DPR DPR DPR
DPR DPR DPR IO
IO
IO
DART Architecture and application design flow
Architecture template
&
%
II - The Moza¨ıc Framework '
$
A - Dynamic Reconfiguration Process and Resources a- Three step reconfiguration ⇀ Y) Step I: the system is running (resources configured in a state X) Step II: the system is running (next configuration Y is propagated) Step III: the system reconfigures (the two configurations are swapped X ↽ b- Three kinds of DUCK (Dynamically Unifier and reConfiguration blocK) wrappers ScanEn
DataIn
WE
Input B
A ScanIn
DFF
ScanIn
DFF cpt
OR
B
A
DFF
ScanEn
DFF
C
DFF
DFF
DFF DFF
D
DFF
16 to 1
1
0
DFF
DFF
DFF
...
1
Ctrl DFF
Arithmetic ADD, SOUS, ABS...
DFF
output
SIMD
Logical ET, OU, ...
DFF
0
DFF
n
DFF
DFF
scanIn
Inputs
cpt
SHIFTER
ScanIn 10
confEn
Input Shifter
1 to 16 ...
DFF
Demux
cout
carry
Input A
...
...
...
...
...
...
SATURATION
...
C
DFF
Tiny reconfiguration Ctrl
ScanEn confEn
input0 input1 input2 cin input3
confEn
Data
Synchron Out DFF
RAM Address 2n x 1
Asynchron Out
R/W
ScanOut
Mux DFF
DFF
DFF
DFF DFF
DFF
Output Shifter
DFF
E
B
SHIFTER
C
ScanOut
A
B
ScanOut
Reconfiguration by Address
Output
Serial Reconfiguration
Parallel Reconfiguration
B - xMAML: High level specification for dynamically reconfigurable architectures a - xMAML DyRIBox specification (interconnection ressources)
d - xMAML reconfiguration domain specification
b - xMAML DyRIOBloc specification (IO ports)
1 < DBDomain name =" DPR_1 " > 2 < R e c o n f i g u r at ion Par ame ter s preemption =" disable " domainCtrl =" shared" p a r t i a l R e co nfi gura tio n =" enable " I R P r i o rityLevel ="3" taskNumber ="10" confBusWidth ="8"/ > 3 < Interconnect type =" manual" > 4 < Instantiation name =" MultBus " instanceOf =" DBox "/ > 5 < InternalConnections > 6 < MultBus : in (1) = DataMem1 : output_0 (0:15)/ > 7 < MultBus : in (2) = DataMem2 : output_0 (0:15)/ > 8 ... 9 < AG4 : output_0 (0:15) = DataMem4 : input_0 (0:15) / > 10 11 12 < ElementsPolytopeRange > 13 < MatrixA row = " 1 0"/ > 14 < MatrixA row = " 1 0"/ > 15 < MatrixA row = " 0 1 "/ > 16 < MatrixA row = " 0 1 "/ > 17 < VectorB value = " 0"/ > 18 < VectorB value = " 8"/ > 19 < VectorB value = " 0 "/ > 20 < VectorB value = " 4"/ > 21 22
1 < DyRIOBloc name =" IO " > 2 < R e c o n f i guration cycle ="1"/ > 3 < Port name =" i1 " bitwidth ="1" direction =" in " buffer ="4"/ > 4 < Port name =" io1 " bitwidth ="1" direction =" inout " buffer ="4"/ > 5 < Port name =" o1 " bitwidth ="1" direction =" out " buffer ="4"/ > 6
1 < P E I n t e r c o nnec tDyR IBox name =" DB " > 2 < R e c o n f i gur ation Time cycle ="1"/ > 3 < DBPorts > 4 < Inputs number ="4" bitwidth ="8" / > 5 < Outputs number ="4" bitwidth ="8" / > 6 7 < PElementsPorts > 8 < Inputs number ="1" bitwidth ="8" / > 9 < Outputs number ="1" bitwidth ="8" / > 10 11 < AdjacencyMatrix > 12 < DOutput idx ="0" row ="01111" / > 13 < DOutput idx ="1" row ="10111" / > 14 < DOutput idx ="2" row ="11011" / > 15 < DOutput idx ="3" row ="11101" / > 16 < POutput idx ="0" row ="11110" / > 17 18
c - xMAML DUCK specification (reconfiguration wrappers) 1 < PEInterface name =" clb " > 2 < R e c o n f iguratio n cycle ="16" bits ="16" preemption =" no "/ > 3 < IOPorts > 4 < Port name =" luti0 " bitwidth ="1" direction =" in " type =" data " / > 5 < Port name =" luti1 " bitwidth ="1" direction =" in " type =" data " / > 6 ... 7 < Port name =" ConfigIn " bitwidth ="1" direction =" in " type =" RAMConfIn "/ > 8 < Port name =" RW " bitwidth ="1" direction =" in " type =" RAMConfEn "/ > 9 < Port name =" ConfAddr " bitwidth ="4" direction =" in " type =" RAMConfAddr "/ > 10 11
&
%
III - Case Study: a dynamically reconfigurable implementation of WCDMA on an eFPGA '
τ1
D.C.
Sr (n)
A
Se (n)
FIR
N
Searcher
τL
Rake Receiver
Slot n + 1 reception
Slot n + 2 reception
Slot n computing
Slot n + 1 computing
Logic synthesis, place & route on the eFPGA: (ABC Berkeley and VPR)
ˆb
Frequency Converter
Configuration
CAG
r
FIR
r
r
Searcher
Rake receiver : 6 fingers +symbole estimation
r
r
FIR
Task
Searcher
WCDMA Receiver
Time
tslot
Configuration Memory ComputingMemory
Configuration Memory ComputingMemory
Configuration Memory
FIR ConfS
T1
ComputingMemory FIR filter configuration
Domain 1 247 logic cells
Domain 2
FIR ConfS
247 logic cells
247 logic cells
ComputingMemory
ComputingMemory
Configuration Memory &
Configuration Memory
T2
FIR ConfS
Searcher configuration
Domain 4
247 logic cells
NOP
r
Searcher ConfR
r
Searcher ConfR
NOP
NOP
r
Current computing DUCK configuration Total=1235 logic cells
T 3a
T 3b
eFPGA
T 3c
T 3d
FIR ConfS
NOP
NOP
NOP
r
Searcher ConfR
r
Searcher ConfR
r
Searcher ConfR
NOP
NOP
NOP
NOP
Symbole
ConfF
NOP
Finger 4 NOP ConfF
NOP
r
Finger 3 NOP ConfF
NOP
r
Finger 2 NOP ConfF
NOP
r
Finger 1 NOP ConfF
NOP
r
Domain 5
247 logic cells Domain 3
FIR ConfS
NOP
CLB r Domain 5
tcomputing
tr
tpropagation
tslot
Rake Receiver configuration
Rake Receiver FIR Searcher finger symbol 245 50 1117 1235 Total: 1030 3382 CLB
r Domain 4 r Domain 3 r Domain 2 r Domain 1 Time
T4
Total
$
Architecture synthesis of the eFPGA: (Synopsys CMOS 130nm) • Static Implementation: (44 mm2)
3382 CLBs and switch blocks
• Dynamic Implementation: 1235 CLBs, switch blocks and reconfiguration control (26 mm2) Impact of dynamic reconfiguration on area: 40% silicon area saved %
IRISA/University of Rennes 6 rue de kerampont BP 80518 F-22305 Lannion Cedex