Cosmology and its Data Flood Challenge - Jean-Luc Starck

Page 1 .... We now perfectly how to calculate some estimators and their covariance .... Low rank constraint: Constraint the PSFs to be a linear combination of the.
13MB taille 11 téléchargements 189 vues
Cosmology and its Data Flood Challenge Jean-Luc Starck http://jstarck.cosmostat.org CEA, IRFU, AIM, Service d'Astrophysique, France,

Samual  Farrens

jeudi 20 octobre 16

Francois  Lanusse

Sandrine  Pires

Fred  Ngolè

Cosmology and its Data Flood Challenge

•Introduction to the standard cosmological model. Three Sources of Uncertainty: ➡Stochastics ➡Systematics ➡Approximations

•Cosmology in the Big Data era ➡Big Data Today ➡Big Data Tomorrow

•Euclid Space Mission ➡Point Spread Function and Deconvolution ➡ Dark matter mass mapping. CosmoStat Lab

2 jeudi 20 octobre 16

The Standard Cosmological Model GW

Supernovae

Lensing

CMB

Galaxy distribution CosmoStat Lab

3 jeudi 20 octobre 16

The Standard Cosmological Model

CosmoStat Lab

4 jeudi 20 octobre 16

Precision Cosmology First Source of Uncertainty: Stochastics - Noise - Cosmic Variance - New instruments with better sensitivy (hardware) - Collect more Data => large survey (SDSS, WMAP, Planck, KIDS, DES, etc)

CosmoStat Lab

5 jeudi 20 octobre 16

Precision Cosmology First Source of Uncertainty: Stochastics - Noise - Cosmic Variance - New instruments with better sensitivy (hardware) - Collect more Data => large survey (SDSS, WMAP, Planck, KIDS, DES, etc) ==> Virtual Observatory (CDS Strasbourg)

- Data access, web services, interoperability, data model, etc CosmoStat Lab

5 jeudi 20 octobre 16

First Source of Uncertainty: Stochastics - Better statistical tools (Bayesian modeling, sparsity, BSS, machine learning, etc): beyond the second order statistics Astrophysic + Statistics/Applied math => Astrostatistics •Two International organizations:  ➡International Astrostatistics Association (IAA) ➡Commission on Astroinformatics and Astrostatistics within the International Astronomical Union (IAU) • Two important U.S. national organizations: 



➡ the Working Group in Astroinformatics and Astrostatistics within the American Astronomical Society (AAS), ➡ the Interest Group in Astrostatistics within the American Statistical Association (ASA).   One project-level organization:  the Informatics and Statistics Science Collaboration of the Large Synoptic Survey Telescope (LSST)

•Astrostatistics laboratories ➡USA: Penn State University, Berkeley, CMU, Cornell ➡Europe: • Imperial Center for Inference and Cosmology (ICIC) at Imperial College • CosmoStat laboratory, CEA-Saclay CosmoStat Lab

6 jeudi 20 octobre 16

Second Source of Uncertainty: Systematics BICEP2:

March 2014 - Primordial Gravitationnal Wave detection claimed by BICEP2 ==> it happened to be a dust signature, dust from our own galaxy !!!

CosmoStat Lab

7 jeudi 20 octobre 16

Second Source of Uncertainty: Systematics BICEP2:

March 2014 - Primordial Gravitationnal Wave detection claimed by BICEP2 ==> it happened to be a dust signature, dust from our own galaxy !!!

Credit: David Van Dick Fig: As datasets grow, systematic errors swamp statistical errors and new disparities appear.

CosmoStat Lab

7 jeudi 20 octobre 16

Second Source of Uncertainty: Systematics

Seehars et al, Physical Review D, Volume 93, Issue 10, id.103507, 2016 CosmoStat Lab

8 jeudi 20 octobre 16

Second Source of Uncertainty: Systematics The need of Numerical Simulations (physics + instrument) - to test the pipeline and its ability to measure accurately the cosmological parameters. - to build the covariance matrices that are required to fit the cosmological parameters.

Numerical simulations are a very important aspect of new big projets.

CosmoStat Lab

9 jeudi 20 octobre 16

Third Source of Uncertainty: Approximation - We now perfectly how to calculate some estimators and their covariance matrices, but the volume of data is so large that it is impossible to do it, even on HPC infrastructures. Theoretical and algorithmic work is necessary to well control errors and biases introduced by the approximations. Examples: •Two point corrrelation functions •Covariance matrices •Example of ongoing work at CMU: simulate N-body simulations using machine learning. •Approximate Bayesian Modelling (ABC) (likelihood free approach to approximating posterior where likelihood function is not specified).

CosmoStat Lab

10 jeudi 20 octobre 16

The Big Data Today

GAIA  space  telescope  

• Map the milky way in 3D • Stellar physics • Dark matter • Extrasolar planets • 50 Gbyte/day; 1 Pbyte total data product • 3D  catalogue  of  ~1  billion  astronomical  objects CosmoStat Lab

11 jeudi 20 octobre 16

The Big Data Today

GAIA  space  telescope  

• Map the milky way in 3D • Stellar physics • Dark matter • Extrasolar planets • 50 Gbyte/day; 1 Pbyte total data product • 3D  catalogue  of  ~1  billion  astronomical  objects CosmoStat Lab

11 jeudi 20 octobre 16

The Big Data Today - LOFAR: the LOw Frequency ARray ● Giant digital & multi-purpose radio telescope distributed across Europe ● Radio interferometer composed of ∼48 phased arrays (stations) ● Working bands: LBA 30-80 MHz & HBA 120-240 MHz ● Improved angular (arcsec), temporal (µs), spectral (kHz) resolutions ● High sensitivity (~mJy) 1 Jy = 10-26 W.m-2.Hz-1 NL Station! ● 24 Pbyte/day raw data - 1Pbyte  per  year  in  archive

CEA - Irfu

jeudi 20 octobre 16

Multi-element interferometer

N

antennas/telescopes

N (N 2

1)

independent baselines

1 projected baseline = 1 sample in the Fourier « u,v » plane

VLA

CEA - Irfu

jeudi 20 octobre 16

Compressed Sensing & LOFAR Cygnus A Data

Garsden    et  al,  “LOFAR  Image  Sparse  Reconstruc7on”,  A&A,  575, A90, 2015. jeudi 20 octobre 16

http://arxiv.org/abs/1406.7242      

Garsden et al, “LOFAR Image Sparse Reconstruction”, A&A, 575, A90, 2015, ArXiv:1406.7242.

250 m s

225

45 00

J. Girard

200 s

175 150

m s

125

44 00

100 75

30s

S. Corbel

50

°

m

25

s

+ 40 43 00

33s

C. Tasse

30s

27s

19h 59m24s

0

RA (J2000)

Colorscale: reconstructed 512x512 image of Cygnus A at 151 MHz (with resolution 2.8” and a pixel size of 1”). Contours levels are [1,2,3,4,5,6,9,13,17,21,25,30,35,37,40] Jy/Beam from a 327.5 MHz Cyg A VLA image (Project AK570) at 2.5” angular resolution and a pixel size of 0.5”. Recovered features in the CS image correspond to real structures observed at higher frequencies. jeudi 20 octobre 16

Jy/beam

H. Garsden

Dec (J2000)

30

The Big Data Tomorrow- SKA : Square Kilometer Array

CosmoStat Lab

16 jeudi 20 octobre 16

The Big Data Tomorrow- SKA : Square Kilometer Array

CosmoStat Lab

17 jeudi 20 octobre 16

The Big Data Tomorrow- SKA : Square Kilometer Array

Credit: Melanie Johnston-Hollitt, COSMO21 conf, Chania, May 2016 CosmoStat Lab

18 jeudi 20 octobre 16

The Big Data Tomorrow- SKA : Square Kilometer Array

Credit: Melanie Johnston-Hollitt, COSMO21 conf, Chania, May 2016 CosmoStat Lab

19 jeudi 20 octobre 16

The Big Data Tomorrow- LSST :   Large  Synop7c  Survey  Telescope - Dark matter, dark energy, cosmology (spatial distribution of galaxies, gravitational lensing, supernovae,quasars). - Time domain (cosmic explosions, variable stars). - The Solar System structure (asteroids) - The Milky Way structure (stars).

Credit: Željko Ivezić, SCMA6 conf, Pittsburg, June 2016

25 TByte per night. After 10 years, half of the sky will be imaged ~100 PB of data (10 years). CosmoStat Lab

22 jeudi 20 octobre 16

20

The Big Data Tomorrow- Euclid ESA Space Mission Understand the origin of the Universe’s accelerating expansion:

!probe

the properties and nature of dark energy, dark matter, gravity and distinguish their effects decisively !by

tracking their observational signatures on the

• geometry of the universe: Weak Lensing + Galaxy Clustering • cosmic history of structure formation: WL, z-space distortion, clusters of galaxies !

Controling systematic residuals to an unprecedented level of accuracy, that cannot be reached by any other competing missions/telescopes Gains in space: Stable data:

homogeneous data set over the whole sky

!Systematics are small, understood and controlled

~150 PB of data .

!Homogeneity : Selection function perfectly controlled CosmoStat Lab

jeudi 20 octobre 16

Weak Lensing

Observer

Gravitational lens

Background galaxies

CEA - Irfu

jeudi 20 octobre 16

Image Forming Process

CosmoStat Lab

jeudi 20 octobre 16

Image Forming Process: Stars and Point Spread Function

•Spatial variability •Temporal variability: jitter, temperature of instrument •Wavelength dependency CosmoStat Lab

jeudi 20 octobre 16

Weakly Lensed Galaxies

jeudi 20 octobre 16

Image Forming Process: Stars and Point Spread Function

•Spatial variability •Temporal variability: jitter, temperature of instrument •Wavelength dependency CosmoStat Lab

jeudi 20 octobre 16

Space Variant PSF

CosmoStat Lab jeudi 20 octobre 16

Space Variant PSF

CosmoStat Lab jeudi 20 octobre 16

PSF Variability

jeudi 20 octobre 16

Euclid PSF Modeling th low resolution image y : k k Observation model Mk: shift and downsampling operator yk = Mk xk + nk , k = 1..n xk: kth well resolved image nk: gaussian noise

-

Joint estimations of super-resolved PSFs at stars positions ➡ Positivity constraint ➡ r Low rank constraint: Constraint the PSFs to be a linear combination of the X eigenvectors PSFs PSF(k) = xk = ai,k si ➡ Smoothness constraint on each PSF i=1 ➡ Proximity constraint : the closer are the stars, the more the coefficients of the linear combination are similar. CosmoStat Lab

30 jeudi 20 octobre 16

Monochromatic PSFs joint superresolution Constraints

X = [x1 , ..., xp ]

=

=

X columns sparse Pixel domain features dictionary

Spatial frequencies dictionary 31

jeudi 20 octobre 16

XT = VW with W columns sparse

Matrix Factorization S = [s1 , ..., sp ] PSF(k) = xk =

r X

ai,k si

i=1

si are ”eigen PSF”

F. Ngole, J.-L Starck, et al, “Constraint matrix factorization for space variant PSFs field restoration”, in press, 2016

1 min kY ↵,S 2

F(S↵V

T

)k2F

+

r X i=1

kwi

(c) s i k1

s.t. k↵[l, :]k0  ⌘l , l = 1....rp and S↵UT

0

CosmoStat Lab

32 jeudi 20 octobre 16

Numerical Experiments Data: 500 Euclid-like PSFs (Zemax), field observed with different SNRs Theese PSFs account for mirrors polishing imperfections, manufacturing and alignments errors and thermal stability of the telescope.

Quality assessment : shape parameters

µ2,0 (X) µ0,2 (X) e1 (X) = µ2,0 (X) + µ0,2 (X) 2µ1,1 (X) e2 (X) = . µ2,0 (X) + µ0,2 (X)

(X) = [e1 (X), e2 (X)]T E =

p X i=1

k (Xi )

ˆ i )k2 /p (X

Disp = kME k? ME = [ (X1 )

ˆ 1 ), ..., (Xp ) (X

ˆ p )] (X CosmoStat Lab

33 jeudi 20 octobre 16

Numerical Experiments With undersampling (upsampling factor of 2)

Center

Local

Corner Obs

Ref

34

PSFEX

RCA CosmoStat Lab

jeudi 20 octobre 16

Numerical Experiments With undersampling (upsampling factor of 2)

log10 (E )

Linear SNR CosmoStat Lab

35 jeudi 20 octobre 16

PSF Interpolation

Optimal transport: CosmoStat Lab

36 jeudi 20 octobre 16

Astronomical Image Deconvolution Sandard deconvolution framework:

y = Hx + n Noise True  Image PSF  Convolution Observed  Image

CosmoStat Lab

jeudi 20 octobre 16

Astronomical Image Deconvolution Sandard deconvolution framework:

y = Hx + n Noise True  Image PSF  Convolution Observed  Image

Sandard deconvolution framework:

argmin X

1 kY 2

2 HXk2

+k

t

Xkp

s.t.

X

0

CosmoStat Lab

jeudi 20 octobre 16

Astronomical Image Deconvolution Sandard deconvolution framework:

y = Hx + n Noise True  Image PSF  Convolution Observed  Image

Sandard deconvolution framework:

argmin X

1 kY 2

2 HXk2

+k

t

Xkp

s.t.

X

0

H is huge !!! CosmoStat Lab

jeudi 20 octobre 16

Big Astronomical Image Deconvolution Object Oriented Deconvolution For each galaxy, we use the PSF related to its center pixel:

Y = H(X) + N

[n0,  n1,  …,  nn] [H0x0,  H1x1,  …,  Hnxn] [y0,  y1,  …,  yn]

argmin X

1 kY 2

2 H(X)k2

+ k

t

Xkp

s.t.

X

0

CosmoStat Lab

jeudi 20 octobre 16

Big Astronomical Image Deconvolution

CosmoStat Lab

jeudi 20 octobre 16

Big Astronomical Image Deconvolution

Galaxy  images  have   similar  properties.

CosmoStat Lab

jeudi 20 octobre 16

Big Astronomical Image Deconvolution

Galaxy  images  have   similar  properties.

argmin X

jeudi 20 octobre 16

1 kY 2

2 H(X)k2

+ kXk⇤ kXk⇤ =

s.t.

X

X

0

i

i

CosmoStat Lab

Optimisation Primal-­‐dual  splitting  from  Condat-­‐Vu  (2013)

argmin

[F (X) + G(X) + K(L(X))]

x

Linear  Operator Functions Convex  Function

Algorithm:  Choose  the  proximal  parameters  τ  >  0,  ς  >  0,  the  positive  relaxation   parameter,  ξ,  and  the  initial  estimate  (X0,  Y0).  Then  iterate,  for  every  k  ≥  0. ˜ k+1 = prox (Xk 1: X ⌧G

⌧ rF (Xk )

˜ k+1 = Yk + &L(2X ˜ k+1 2: Y

Xk )

⌧ L⇤ (Yk )) ⇣Y k ˜ k+1 &proxK/& + L(2X &

˜ k+1 , Y ˜ k+1 ) + (1 3 : (Xk+1 , Yk+1 ) := ⇠(X

⇠)(Yk , Yk )

Xk )



CosmoStat Lab

jeudi 20 octobre 16

The Simulated Data • • •

10,000  space-­‐based  galaxy  images  derived  from  COSMOS  data. Each   image   is   a   41×41   pixel   postage   stamp   around   the   centre   of   the   galaxy. Images  are  free  from  PSF  effects.

CosmoStat Lab

jeudi 20 octobre 16

The Simulated Data



600  spatially  varying  Euclid-­‐like  PSFs  



Each  galaxy  image  is  convolved  with  a  random  PSF. Different  levels  of  Gaussian  noise  is  added.



CosmoStat Lab

jeudi 20 octobre 16

Results X

Y

ˆ X

n 8 > > > > > > < > > > > > > :

jeudi 20 octobre 16

n

Clean  Image

Data

Sparse  Recovery

Low  Rank

CosmoStat Lab

Results Perr

ˆ 2 kX Xk = kXk2

S. Farrens ⋆, F.M. Ngolè Mboula, and J.-L. Starck, “Space variant deconvolution of galaxy survey images”, submitted, 2016.

Code available at jeudi 20 octobre 16

CosmoStat Lab

https://github.com/sfarrens/psf

Results Perr

ˆ 2 kX Xk = kXk2

S. Farrens ⋆, F.M. Ngolè Mboula, and J.-L. Starck, “Space variant deconvolution of galaxy survey images”, submitted, 2016.

Code available at jeudi 20 octobre 16

CosmoStat Lab

https://github.com/sfarrens/psf

Results Perr

ˆ 2 kX Xk = kXk2

S. Farrens ⋆, F.M. Ngolè Mboula, and J.-L. Starck, “Space variant deconvolution of galaxy survey images”, submitted, 2016.

Code available at jeudi 20 octobre 16

CosmoStat Lab

https://github.com/sfarrens/psf

Shear Catalog & Map

Few undesampled images of a given galaxy

PSF superresolution + Interpolation + Shape Mesurement

Many PSF at other positions jeudi 20 octobre 16

Mass Mapping  ˆ = P1 ˆ1 +P2 ˆ2 P1 (k) =

k12

k22

k2 2k1 k2 P2 (k) = k2

(2)

Clusters

CosmoStat Lab

jeudi 20 octobre 16

Mass Mapping  ˆ = P1 ˆ1 +P2 ˆ2 P1 (k) =

k12

k22

k2 2k1 k2 P2 (k) = k2

(2)

Clusters

•Missing data (mask and limited number densities):

•Shape noise: CosmoStat Lab

jeudi 20 octobre 16

Handling Missing Data (no noise): Binning+Smoothing

Input

Kaiser-Squires with 1' bins

Galaxy catalogue with 30 gal/arcmin2

Kaiser-Squires with 0.05' bins

KS with 0.05' bins + 0.1’ smoothing CosmoStat Lab

jeudi 20 octobre 16

Mass mapping as an inverse problem = F ⇤P F 

Binned data: Unbinned data:



= T PF T = Non Equispaced Discrete Fourier Transform (NDFT)

1 min k  2

P k22

+ C( )

with P = T ⇤ P F C() =

X ✓

k(

t

)✓ kp =

X j

k ↵ j kp CosmoStat Lab

jeudi 20 octobre 16

Example with 93 % of missing data Input

10’ x 10’, z=0.3 cluster, Lens plane at redshift zs = 1.2

CosmoStat Lab

Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16

Example with 93 % of missing data Lensing zs =cluster, 1.2 Input catalogue: Lens plane 10’atx redshift 10’, z=0.3 Lens plane at redshift zs = 1.2 93% of missing pixels.

CosmoStat Lab

Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16

Example with 93 % of missing data Lensing catalogue: Lens plane zs =cluster, 1.2 Input 10’atx redshift 10’, z=0.3 Kaiser-Squires inversion Lens plane at redshift zs = 1.2 93% of missing pixels.

CosmoStat Lab

Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16

Example with 93 % of missing data Lensing catalogue: Lens plane at redshift zs =cluster, 1.2 Input Kaiser-Squires Kaiser-Squires +inversion 0.25’ smoothing 10’ x 10’, z=0.3 Lens plane at redshift zs = 1.2 93% of missing pixels.

CosmoStat Lab

Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16

Example with 93 % of missing data Lensing GLIMPSEcatalogue: 2D Lens plane at redshift zs =cluster, 1.2 Input Kaiser-Squires Kaiser-Squires +inversion 0.25’ smoothing 10’ x 10’, z=0.3 Lens plane at redshift zs = 1.2 93% of missing pixels.

CosmoStat Lab

Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16

Example with 93 % of missing data 10’ x 10’, z=0.3 cluster, Lens plane at redshift zs = 1.2

Lensing catalogue Lens plane at redshift zs = 1.2

Input

Kaiser-Squires inversion

Kaiser-Squires + 0.25’ smoothing

GLIMPSE 2D CosmoStat Lab

Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16

Missing Data + Noise 10’ x 10’, z=0.3 cluster, ng=30/arcmin2

Input

Kaiser-Squires + 0.5’ smoothing

Kaiser-Squires + 1.0’ smoothing

GLIMPSE 2D CosmoStat Lab

jeudi 20 octobre 16

Computational Astrophysics SKA/LSST/Euclid is BIG DATA, but also: rich and very complex data, which require sophisticated statistical methods, astrophysical models and a huge amount of additionnal simulated data ==> Big computational challenges Computational Astrophysics = Astrophysics + Statistics/Applied math + Computer Science (machine learning, HPC, simulations, etc)

CosmoStat Lab

52 jeudi 20 octobre 16

Computational Astrophysics SKA/LSST/Euclid is BIG DATA, but also: rich and very complex data, which require sophisticated statistical methods, astrophysical models and a huge amount of additionnal simulated data ==> Big computational challenges Computational Astrophysics = Astrophysics + Statistics/Applied math + Computer Science (machine learning, HPC, simulations, etc)

USA: The Center for Computational Astrophysics, created in 2016, leaded by David Spergel, Manhattan, NY, USA 60 open positions CosmoStat Lab

52 jeudi 20 octobre 16

Conclusions

- Upcoming surveys (SKA, LSST and Euclid) will provide fantastic new data set. - To the first two kinds of uncertainies, stochastics and systematics, a third one has now to be considered: Approximation. - Mathematical challenges: higher order statistics, combine probes, etc. - HPC is a concern for processing of the final products, and not anymore only for pre-processing or simulations. - Astrostatistics teams need to extended to include people from computer science (machine learning, HPC, etc) - The future astronomers may not ask for observing time to collect data, but rather for computing time to process the data that are already collected. CosmoStat Lab

53 jeudi 20 octobre 16