ACKNOWLEDGMENT: This work has been supported in part by the French National Research Agency (ANR) through COSINUS program (project MIDAS no. ANR-09-COSI-009).
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Outline ►
Motivations
►
Toeplitz algebra module structure
►
Core routines description (sequential/omp)
►
Mpi routines description
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Introduction
Motivations
Provide tools for Cosmic Microwave Background (CMB) data analysis
► ► ► ► ►
High performance Massively parallel Portable Including new algorithms and techniques Capable of dealing with large data volumes
Solve the maximum likelihood CMB map making equation Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Toeplitz matrix for CMB data analysis
Introduction
Why a Toeplitz? To solve the maximum likelihood map making equation (gaussian noise), we generally assume that: ► N is the time time noise covariance matrix ; symmetric and positive definite ► is approximate as a band diagonal and piecewise Toeplitz ; symmetric and positive definite This give us efficient algorithms: ► ►
Fast algorithms related to FFT Less data to store
This is generally considered as a good compromise to represent the noise correlation of the CMB
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Library structure
Toeplitz algebra module structure
API routines
MPI routines
Sequential / OMP routines
Internal routines
Low level routines
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Lower internal routines
Toeplitz algebra module structure API routines dependency diagram
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Library structure
The sliding window algorithm – stmm_core routine ► ►
Core Algorithms
Limits the data size to be copied to blocksize Performances are highly dependant on the defined blocksize
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Core Algorithms
Direct product – stmm_direct routine FFT product using circulant matrix properties
is the first column of C FFT cost is O(n log n) Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Direct product – stmm_direct routine The product is made terms by terms without using FFTs
Can be used for very small bandwith Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Core Algorithms
Sequential/omp user example – stmm routine
Core Algorithms
fftw_complex
*V_fft,
*T_fft;
double
*V_rfft;
fftw_plan
plan_f,
plan_b;
tpltz_init(v_size,
lambda
,
&nfft,
&blocksize,
&T_fft,
T,
&V_fft,
&V_rfft,
&plan_f,
&plan_b);
stmm(V,
n,
m,
id0,
local_V_size,
T_fft,
lambda,
V_fft,
V_rfft,
plan_f,
plan_b,
blocksize,
nfft);
tpltz_cleanup(&T_fft,
&V_fft,
&V_rfft,
&plan_f,
&plan_b);
Full columns
► Input
data structure is related to the chosen data distribution ► Nfft : number of simultaneous FFTs ► Blocksize : Block dimension used in the sliding windows algorithm
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Core Algorithms
Input data reshaping – stmm routine Reshape input data for optimal performance computation ► ► ►
Indices functions composition are used to obtain the right transformation No more than one copy to build the reshape data structure Inverse transformations are used to extract the computed result
Input data distribution mat2vect
vect2nfft
concat
Reshape data distribution Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Computation of the optimal blocksize estimation
Core Algorithms
Two formulas can be used to compute the blocksize bs: ►
Compute the minimum power of 2 above 3 times the minimum correlation length:
►
Compute the minimum power of two which fulfills:
We add also a constraint to avoid block bigger than the matrix (append mostly when the matrix size is small)
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Toeplitz block diagonal – stbmm routine ► ► ►
MPI Algorithms
Use Floating blocks position Communications between neighbors are made only if needed One can define the minimum blocks required or more, even all
Global row-wise order data distribution Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Toeplitz block diagonal – stbmm routine ► ► ►
MPI Algorithms
Use Floating blocks position Communications between neighbors are made only if needed One can define the minimum blocks required or more, even all
Data distribution is row-wise order per process Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Gappy Toeplitz block diagonal – gstbmm routine ► ► ► ►
MPI Algorithms
Build gappy blocks considering gap locations Reset to zeros all the gaps Call the stbmm routine with this freshly build gappy data structure Reset to zeros all the gaps to clean the wrong results
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Future improvements ► ► ► ►
► ►
Do more performance and precision tests Object oriented structure elements Extend the data size to more than (32 bits integer limit) Give some tools routines to help to scatter between process considering good balance estimation of the work computation Fast inverse algorithm for symmetric band Toeplitz matrix Include other kind of block shapes in the stbmm routine
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Conclusion ►
The main pieces of the toeplitz algebra module are here
►
This gives an efficient and flexible set of routines
►
Advances tuning is possible for any expert user
►
This is only a part of the global library and close related to the pointing/unpointing product include in the Mapmat module
►
The Midapack librairy can be download on the official website: http://www.apc.univ-paris7.fr/APC_CS/Recherche/Adamis/MIDAS09/software.html
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012
Thank you for your attention.
Toeplitz
algebra
module
|
Midas
meeting
–
15
June
2012