Cosmology and its Data Flood Challenge Jean-Luc Starck http://jstarck.cosmostat.org CEA, IRFU, AIM, Service d'Astrophysique, France,
Samual Farrens
jeudi 20 octobre 16
Francois Lanusse
Sandrine Pires
Fred Ngolè
Cosmology and its Data Flood Challenge
•Introduction to the standard cosmological model. Three Sources of Uncertainty: ➡Stochastics ➡Systematics ➡Approximations
•Cosmology in the Big Data era ➡Big Data Today ➡Big Data Tomorrow
•Euclid Space Mission ➡Point Spread Function and Deconvolution ➡ Dark matter mass mapping. CosmoStat Lab
2 jeudi 20 octobre 16
The Standard Cosmological Model GW
Supernovae
Lensing
CMB
Galaxy distribution CosmoStat Lab
3 jeudi 20 octobre 16
The Standard Cosmological Model
CosmoStat Lab
4 jeudi 20 octobre 16
Precision Cosmology First Source of Uncertainty: Stochastics - Noise - Cosmic Variance - New instruments with better sensitivy (hardware) - Collect more Data => large survey (SDSS, WMAP, Planck, KIDS, DES, etc)
CosmoStat Lab
5 jeudi 20 octobre 16
Precision Cosmology First Source of Uncertainty: Stochastics - Noise - Cosmic Variance - New instruments with better sensitivy (hardware) - Collect more Data => large survey (SDSS, WMAP, Planck, KIDS, DES, etc) ==> Virtual Observatory (CDS Strasbourg)
- Data access, web services, interoperability, data model, etc CosmoStat Lab
5 jeudi 20 octobre 16
First Source of Uncertainty: Stochastics - Better statistical tools (Bayesian modeling, sparsity, BSS, machine learning, etc): beyond the second order statistics Astrophysic + Statistics/Applied math => Astrostatistics •Two International organizations: ➡International Astrostatistics Association (IAA) ➡Commission on Astroinformatics and Astrostatistics within the International Astronomical Union (IAU) • Two important U.S. national organizations:
•
➡ the Working Group in Astroinformatics and Astrostatistics within the American Astronomical Society (AAS), ➡ the Interest Group in Astrostatistics within the American Statistical Association (ASA). One project-level organization: the Informatics and Statistics Science Collaboration of the Large Synoptic Survey Telescope (LSST)
•Astrostatistics laboratories ➡USA: Penn State University, Berkeley, CMU, Cornell ➡Europe: • Imperial Center for Inference and Cosmology (ICIC) at Imperial College • CosmoStat laboratory, CEA-Saclay CosmoStat Lab
6 jeudi 20 octobre 16
Second Source of Uncertainty: Systematics BICEP2:
March 2014 - Primordial Gravitationnal Wave detection claimed by BICEP2 ==> it happened to be a dust signature, dust from our own galaxy !!!
CosmoStat Lab
7 jeudi 20 octobre 16
Second Source of Uncertainty: Systematics BICEP2:
March 2014 - Primordial Gravitationnal Wave detection claimed by BICEP2 ==> it happened to be a dust signature, dust from our own galaxy !!!
Credit: David Van Dick Fig: As datasets grow, systematic errors swamp statistical errors and new disparities appear.
CosmoStat Lab
7 jeudi 20 octobre 16
Second Source of Uncertainty: Systematics
Seehars et al, Physical Review D, Volume 93, Issue 10, id.103507, 2016 CosmoStat Lab
8 jeudi 20 octobre 16
Second Source of Uncertainty: Systematics The need of Numerical Simulations (physics + instrument) - to test the pipeline and its ability to measure accurately the cosmological parameters. - to build the covariance matrices that are required to fit the cosmological parameters.
Numerical simulations are a very important aspect of new big projets.
CosmoStat Lab
9 jeudi 20 octobre 16
Third Source of Uncertainty: Approximation - We now perfectly how to calculate some estimators and their covariance matrices, but the volume of data is so large that it is impossible to do it, even on HPC infrastructures. Theoretical and algorithmic work is necessary to well control errors and biases introduced by the approximations. Examples: •Two point corrrelation functions •Covariance matrices •Example of ongoing work at CMU: simulate N-body simulations using machine learning. •Approximate Bayesian Modelling (ABC) (likelihood free approach to approximating posterior where likelihood function is not specified).
CosmoStat Lab
10 jeudi 20 octobre 16
The Big Data Today
GAIA space telescope
• Map the milky way in 3D • Stellar physics • Dark matter • Extrasolar planets • 50 Gbyte/day; 1 Pbyte total data product • 3D catalogue of ~1 billion astronomical objects CosmoStat Lab
11 jeudi 20 octobre 16
The Big Data Today
GAIA space telescope
• Map the milky way in 3D • Stellar physics • Dark matter • Extrasolar planets • 50 Gbyte/day; 1 Pbyte total data product • 3D catalogue of ~1 billion astronomical objects CosmoStat Lab
11 jeudi 20 octobre 16
The Big Data Today - LOFAR: the LOw Frequency ARray ● Giant digital & multi-purpose radio telescope distributed across Europe ● Radio interferometer composed of ∼48 phased arrays (stations) ● Working bands: LBA 30-80 MHz & HBA 120-240 MHz ● Improved angular (arcsec), temporal (µs), spectral (kHz) resolutions ● High sensitivity (~mJy) 1 Jy = 10-26 W.m-2.Hz-1 NL Station! ● 24 Pbyte/day raw data - 1Pbyte per year in archive
CEA - Irfu
jeudi 20 octobre 16
Multi-element interferometer
N
antennas/telescopes
N (N 2
1)
independent baselines
1 projected baseline = 1 sample in the Fourier « u,v » plane
VLA
CEA - Irfu
jeudi 20 octobre 16
Compressed Sensing & LOFAR Cygnus A Data
Garsden et al, “LOFAR Image Sparse Reconstruc7on”, A&A, 575, A90, 2015. jeudi 20 octobre 16
http://arxiv.org/abs/1406.7242
Garsden et al, “LOFAR Image Sparse Reconstruction”, A&A, 575, A90, 2015, ArXiv:1406.7242.
250 m s
225
45 00
J. Girard
200 s
175 150
m s
125
44 00
100 75
30s
S. Corbel
50
°
m
25
s
+ 40 43 00
33s
C. Tasse
30s
27s
19h 59m24s
0
RA (J2000)
Colorscale: reconstructed 512x512 image of Cygnus A at 151 MHz (with resolution 2.8” and a pixel size of 1”). Contours levels are [1,2,3,4,5,6,9,13,17,21,25,30,35,37,40] Jy/Beam from a 327.5 MHz Cyg A VLA image (Project AK570) at 2.5” angular resolution and a pixel size of 0.5”. Recovered features in the CS image correspond to real structures observed at higher frequencies. jeudi 20 octobre 16
Jy/beam
H. Garsden
Dec (J2000)
30
The Big Data Tomorrow- SKA : Square Kilometer Array
CosmoStat Lab
16 jeudi 20 octobre 16
The Big Data Tomorrow- SKA : Square Kilometer Array
CosmoStat Lab
17 jeudi 20 octobre 16
The Big Data Tomorrow- SKA : Square Kilometer Array
Credit: Melanie Johnston-Hollitt, COSMO21 conf, Chania, May 2016 CosmoStat Lab
18 jeudi 20 octobre 16
The Big Data Tomorrow- SKA : Square Kilometer Array
Credit: Melanie Johnston-Hollitt, COSMO21 conf, Chania, May 2016 CosmoStat Lab
19 jeudi 20 octobre 16
The Big Data Tomorrow- LSST : Large Synop7c Survey Telescope - Dark matter, dark energy, cosmology (spatial distribution of galaxies, gravitational lensing, supernovae,quasars). - Time domain (cosmic explosions, variable stars). - The Solar System structure (asteroids) - The Milky Way structure (stars).
Credit: Željko Ivezić, SCMA6 conf, Pittsburg, June 2016
25 TByte per night. After 10 years, half of the sky will be imaged ~100 PB of data (10 years). CosmoStat Lab
22 jeudi 20 octobre 16
20
The Big Data Tomorrow- Euclid ESA Space Mission Understand the origin of the Universe’s accelerating expansion:
!probe
the properties and nature of dark energy, dark matter, gravity and distinguish their effects decisively !by
tracking their observational signatures on the
• geometry of the universe: Weak Lensing + Galaxy Clustering • cosmic history of structure formation: WL, z-space distortion, clusters of galaxies !
Controling systematic residuals to an unprecedented level of accuracy, that cannot be reached by any other competing missions/telescopes Gains in space: Stable data:
homogeneous data set over the whole sky
!Systematics are small, understood and controlled
~150 PB of data .
!Homogeneity : Selection function perfectly controlled CosmoStat Lab
jeudi 20 octobre 16
Weak Lensing
Observer
Gravitational lens
Background galaxies
CEA - Irfu
jeudi 20 octobre 16
Image Forming Process
CosmoStat Lab
jeudi 20 octobre 16
Image Forming Process: Stars and Point Spread Function
•Spatial variability •Temporal variability: jitter, temperature of instrument •Wavelength dependency CosmoStat Lab
jeudi 20 octobre 16
Weakly Lensed Galaxies
jeudi 20 octobre 16
Image Forming Process: Stars and Point Spread Function
•Spatial variability •Temporal variability: jitter, temperature of instrument •Wavelength dependency CosmoStat Lab
jeudi 20 octobre 16
Space Variant PSF
CosmoStat Lab jeudi 20 octobre 16
Space Variant PSF
CosmoStat Lab jeudi 20 octobre 16
PSF Variability
jeudi 20 octobre 16
Euclid PSF Modeling th low resolution image y : k k Observation model Mk: shift and downsampling operator yk = Mk xk + nk , k = 1..n xk: kth well resolved image nk: gaussian noise
-
Joint estimations of super-resolved PSFs at stars positions ➡ Positivity constraint ➡ r Low rank constraint: Constraint the PSFs to be a linear combination of the X eigenvectors PSFs PSF(k) = xk = ai,k si ➡ Smoothness constraint on each PSF i=1 ➡ Proximity constraint : the closer are the stars, the more the coefficients of the linear combination are similar. CosmoStat Lab
30 jeudi 20 octobre 16
Monochromatic PSFs joint superresolution Constraints
X = [x1 , ..., xp ]
=
=
X columns sparse Pixel domain features dictionary
Spatial frequencies dictionary 31
jeudi 20 octobre 16
XT = VW with W columns sparse
Matrix Factorization S = [s1 , ..., sp ] PSF(k) = xk =
r X
ai,k si
i=1
si are ”eigen PSF”
F. Ngole, J.-L Starck, et al, “Constraint matrix factorization for space variant PSFs field restoration”, in press, 2016
1 min kY ↵,S 2
F(S↵V
T
)k2F
+
r X i=1
kwi
(c) s i k1
s.t. k↵[l, :]k0 ⌘l , l = 1....rp and S↵UT
0
CosmoStat Lab
32 jeudi 20 octobre 16
Numerical Experiments Data: 500 Euclid-like PSFs (Zemax), field observed with different SNRs Theese PSFs account for mirrors polishing imperfections, manufacturing and alignments errors and thermal stability of the telescope.
Quality assessment : shape parameters
µ2,0 (X) µ0,2 (X) e1 (X) = µ2,0 (X) + µ0,2 (X) 2µ1,1 (X) e2 (X) = . µ2,0 (X) + µ0,2 (X)
(X) = [e1 (X), e2 (X)]T E =
p X i=1
k (Xi )
ˆ i )k2 /p (X
Disp = kME k? ME = [ (X1 )
ˆ 1 ), ..., (Xp ) (X
ˆ p )] (X CosmoStat Lab
33 jeudi 20 octobre 16
Numerical Experiments With undersampling (upsampling factor of 2)
Center
Local
Corner Obs
Ref
34
PSFEX
RCA CosmoStat Lab
jeudi 20 octobre 16
Numerical Experiments With undersampling (upsampling factor of 2)
log10 (E )
Linear SNR CosmoStat Lab
35 jeudi 20 octobre 16
PSF Interpolation
Optimal transport: CosmoStat Lab
36 jeudi 20 octobre 16
Astronomical Image Deconvolution Sandard deconvolution framework:
y = Hx + n Noise True Image PSF Convolution Observed Image
CosmoStat Lab
jeudi 20 octobre 16
Astronomical Image Deconvolution Sandard deconvolution framework:
y = Hx + n Noise True Image PSF Convolution Observed Image
Sandard deconvolution framework:
argmin X
1 kY 2
2 HXk2
+k
t
Xkp
s.t.
X
0
CosmoStat Lab
jeudi 20 octobre 16
Astronomical Image Deconvolution Sandard deconvolution framework:
y = Hx + n Noise True Image PSF Convolution Observed Image
Sandard deconvolution framework:
argmin X
1 kY 2
2 HXk2
+k
t
Xkp
s.t.
X
0
H is huge !!! CosmoStat Lab
jeudi 20 octobre 16
Big Astronomical Image Deconvolution Object Oriented Deconvolution For each galaxy, we use the PSF related to its center pixel:
Y = H(X) + N
[n0, n1, …, nn] [H0x0, H1x1, …, Hnxn] [y0, y1, …, yn]
argmin X
1 kY 2
2 H(X)k2
+ k
t
Xkp
s.t.
X
0
CosmoStat Lab
jeudi 20 octobre 16
Big Astronomical Image Deconvolution
CosmoStat Lab
jeudi 20 octobre 16
Big Astronomical Image Deconvolution
Galaxy images have similar properties.
CosmoStat Lab
jeudi 20 octobre 16
Big Astronomical Image Deconvolution
Galaxy images have similar properties.
argmin X
jeudi 20 octobre 16
1 kY 2
2 H(X)k2
+ kXk⇤ kXk⇤ =
s.t.
X
X
0
i
i
CosmoStat Lab
Optimisation Primal-‐dual splitting from Condat-‐Vu (2013)
argmin
[F (X) + G(X) + K(L(X))]
x
Linear Operator Functions Convex Function
Algorithm: Choose the proximal parameters τ > 0, ς > 0, the positive relaxation parameter, ξ, and the initial estimate (X0, Y0). Then iterate, for every k ≥ 0. ˜ k+1 = prox (Xk 1: X ⌧G
⌧ rF (Xk )
˜ k+1 = Yk + &L(2X ˜ k+1 2: Y
Xk )
⌧ L⇤ (Yk )) ⇣Y k ˜ k+1 &proxK/& + L(2X &
˜ k+1 , Y ˜ k+1 ) + (1 3 : (Xk+1 , Yk+1 ) := ⇠(X
⇠)(Yk , Yk )
Xk )
⌘
CosmoStat Lab
jeudi 20 octobre 16
The Simulated Data • • •
10,000 space-‐based galaxy images derived from COSMOS data. Each image is a 41×41 pixel postage stamp around the centre of the galaxy. Images are free from PSF effects.
CosmoStat Lab
jeudi 20 octobre 16
The Simulated Data
•
600 spatially varying Euclid-‐like PSFs
•
Each galaxy image is convolved with a random PSF. Different levels of Gaussian noise is added.
•
CosmoStat Lab
jeudi 20 octobre 16
Results X
Y
ˆ X
n 8 > > > > > > < > > > > > > :
jeudi 20 octobre 16
n
Clean Image
Data
Sparse Recovery
Low Rank
CosmoStat Lab
Results Perr
ˆ 2 kX Xk = kXk2
S. Farrens ⋆, F.M. Ngolè Mboula, and J.-L. Starck, “Space variant deconvolution of galaxy survey images”, submitted, 2016.
Code available at jeudi 20 octobre 16
CosmoStat Lab
https://github.com/sfarrens/psf
Results Perr
ˆ 2 kX Xk = kXk2
S. Farrens ⋆, F.M. Ngolè Mboula, and J.-L. Starck, “Space variant deconvolution of galaxy survey images”, submitted, 2016.
Code available at jeudi 20 octobre 16
CosmoStat Lab
https://github.com/sfarrens/psf
Results Perr
ˆ 2 kX Xk = kXk2
S. Farrens ⋆, F.M. Ngolè Mboula, and J.-L. Starck, “Space variant deconvolution of galaxy survey images”, submitted, 2016.
Code available at jeudi 20 octobre 16
CosmoStat Lab
https://github.com/sfarrens/psf
Shear Catalog & Map
Few undesampled images of a given galaxy
PSF superresolution + Interpolation + Shape Mesurement
Many PSF at other positions jeudi 20 octobre 16
Mass Mapping ˆ = P1 ˆ1 +P2 ˆ2 P1 (k) =
k12
k22
k2 2k1 k2 P2 (k) = k2
(2)
Clusters
CosmoStat Lab
jeudi 20 octobre 16
Mass Mapping ˆ = P1 ˆ1 +P2 ˆ2 P1 (k) =
k12
k22
k2 2k1 k2 P2 (k) = k2
(2)
Clusters
•Missing data (mask and limited number densities):
•Shape noise: CosmoStat Lab
jeudi 20 octobre 16
Handling Missing Data (no noise): Binning+Smoothing
Input
Kaiser-Squires with 1' bins
Galaxy catalogue with 30 gal/arcmin2
Kaiser-Squires with 0.05' bins
KS with 0.05' bins + 0.1’ smoothing CosmoStat Lab
jeudi 20 octobre 16
Mass mapping as an inverse problem = F ⇤P F
Binned data: Unbinned data:
⇤
= T PF T = Non Equispaced Discrete Fourier Transform (NDFT)
1 min k 2
P k22
+ C( )
with P = T ⇤ P F C() =
X ✓
k(
t
)✓ kp =
X j
k ↵ j kp CosmoStat Lab
jeudi 20 octobre 16
Example with 93 % of missing data Input
10’ x 10’, z=0.3 cluster, Lens plane at redshift zs = 1.2
CosmoStat Lab
Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16
Example with 93 % of missing data Lensing zs =cluster, 1.2 Input catalogue: Lens plane 10’atx redshift 10’, z=0.3 Lens plane at redshift zs = 1.2 93% of missing pixels.
CosmoStat Lab
Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16
Example with 93 % of missing data Lensing catalogue: Lens plane zs =cluster, 1.2 Input 10’atx redshift 10’, z=0.3 Kaiser-Squires inversion Lens plane at redshift zs = 1.2 93% of missing pixels.
CosmoStat Lab
Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16
Example with 93 % of missing data Lensing catalogue: Lens plane at redshift zs =cluster, 1.2 Input Kaiser-Squires Kaiser-Squires +inversion 0.25’ smoothing 10’ x 10’, z=0.3 Lens plane at redshift zs = 1.2 93% of missing pixels.
CosmoStat Lab
Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16
Example with 93 % of missing data Lensing GLIMPSEcatalogue: 2D Lens plane at redshift zs =cluster, 1.2 Input Kaiser-Squires Kaiser-Squires +inversion 0.25’ smoothing 10’ x 10’, z=0.3 Lens plane at redshift zs = 1.2 93% of missing pixels.
CosmoStat Lab
Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16
Example with 93 % of missing data 10’ x 10’, z=0.3 cluster, Lens plane at redshift zs = 1.2
Lensing catalogue Lens plane at redshift zs = 1.2
Input
Kaiser-Squires inversion
Kaiser-Squires + 0.25’ smoothing
GLIMPSE 2D CosmoStat Lab
Galaxy distribution: 93% of missing pixels, corresponding to 30 galaxies per square arcminute jeudi 20 octobre 16
Missing Data + Noise 10’ x 10’, z=0.3 cluster, ng=30/arcmin2
Input
Kaiser-Squires + 0.5’ smoothing
Kaiser-Squires + 1.0’ smoothing
GLIMPSE 2D CosmoStat Lab
jeudi 20 octobre 16
Computational Astrophysics SKA/LSST/Euclid is BIG DATA, but also: rich and very complex data, which require sophisticated statistical methods, astrophysical models and a huge amount of additionnal simulated data ==> Big computational challenges Computational Astrophysics = Astrophysics + Statistics/Applied math + Computer Science (machine learning, HPC, simulations, etc)
CosmoStat Lab
52 jeudi 20 octobre 16
Computational Astrophysics SKA/LSST/Euclid is BIG DATA, but also: rich and very complex data, which require sophisticated statistical methods, astrophysical models and a huge amount of additionnal simulated data ==> Big computational challenges Computational Astrophysics = Astrophysics + Statistics/Applied math + Computer Science (machine learning, HPC, simulations, etc)
USA: The Center for Computational Astrophysics, created in 2016, leaded by David Spergel, Manhattan, NY, USA 60 open positions CosmoStat Lab
52 jeudi 20 octobre 16
Conclusions
- Upcoming surveys (SKA, LSST and Euclid) will provide fantastic new data set. - To the first two kinds of uncertainies, stochastics and systematics, a third one has now to be considered: Approximation. - Mathematical challenges: higher order statistics, combine probes, etc. - HPC is a concern for processing of the final products, and not anymore only for pre-processing or simulations. - Astrostatistics teams need to extended to include people from computer science (machine learning, HPC, etc) - The future astronomers may not ask for observing time to collect data, but rather for computing time to process the data that are already collected. CosmoStat Lab
53 jeudi 20 octobre 16