Magnetohydrodynamic Turbulence in Accretion Discs - A ... - Marc Joos

Nov 14, 2013 -
38MB taille 3 téléchargements 348 vues
HJ 7 jh

Magnetohydrodynamic Turbulence in Accretion Discs A test case for petascale computing in astrophysics JH hj Tt Marc Joos Sébastien Fromang Collaborators: Pierre Kestener, Geoffroy Lesur, Héloïse Méheut, Daniel Pomarède & Bruno Thooris Patrick Hennebelle, Andrea Ciardi, Romain Teyssier...

Service d’Astrophysique - CEA Saclay

14/11/2013

M. Joos Petascale computing in astrophysics

1/35

Outline Introduction Accretion discs Magneto-rotational instability Numerical approach IBM BlueGene/Q Numerical methods & initial conditions What challenges? Results Overview Power spectra Angular momentum transport rate Parallel I/O Why do we care? Approaches Benchmark Hybridation Why hybridize codes? Hybridation of Ramses Auto-parallelization GPU Why do we want GPUs? OpenACC 14/11/2013

M. Joos Petascale computing in astrophysics

2/35

Outline Introduction Accretion discs Magneto-rotational instability Numerical approach Results Parallel I/O Hybridation GPU

14/11/2013

M. Joos Petascale computing in astrophysics

3/35

Introduction I Accretion discs

Accretion discs What is it? I

discs of diffuse material (gas mostly)

I

rotating around central object observed at all scales

I

I I I I

(NASA)

protostars neutron stars supermassive black holes etc.

(Grosso et al., 2003) 14/11/2013

M. Joos Petascale computing in astrophysics

4/35

Introduction I Accretion discs

Accretion discs What is it? I

discs of diffuse material (gas mostly)

I

rotating around central object observed at all scales

I

I I I I

(NASA)

protostars neutron stars supermassive black holes etc.

Angular momentum: I

Material accretion ⇒ angular momentum loss

(Grosso et al., 2003) 14/11/2013

M. Joos Petascale computing in astrophysics

4/35

Introduction I Accretion discs

Accretion discs What is it? I

discs of diffuse material (gas mostly)

I

rotating around central object observed at all scales

I

I I I I

(NASA)

protostars neutron stars supermassive black holes etc.

Angular momentum: I

Which mechanism to transport efficiently angular momentum?

(Grosso et al., 2003) 14/11/2013

M. Joos Petascale computing in astrophysics

4/35

Introduction I Accretion discs

Accretion discs What is it? I

discs of diffuse material (gas mostly)

I

rotating around central object observed at all scales

I

I I I I

(NASA)

protostars neutron stars supermassive black holes etc.

Angular momentum: I

ad hoc prescription: νt = αcs H (Shakura & Synyaev 1973; Lynden-Bell & Pringle 1974)

(Grosso et al., 2003) 14/11/2013

M. Joos Petascale computing in astrophysics

4/35

Introduction I Magneto-rotational instability

The magneto-rotational instability (MRI)

What is it? I

MHD instability (Balbus & Hawley 1991)

I

weak B field

I

dr Ω < 0

14/11/2013

M. Joos Petascale computing in astrophysics

5/35

Introduction I Magneto-rotational instability

The magneto-rotational instability (MRI)

What is it? I

MHD instability (Balbus & Hawley 1991)

I

weak B field

I

dr Ω < 0

Fig.: Magneto-rotational instability principle 14/11/2013

M. Joos Petascale computing in astrophysics

5/35

Introduction I Magneto-rotational instability

The magneto-rotational instability (MRI)

What is it? I

MHD instability (Balbus & Hawley 1991)

I

weak B field

I

dr Ω < 0 Fig.: MRI principle

Some dimensionless numbers. . . I

Magnetic intensity: β ∼ (cs /va )2

14/11/2013

M. Joos Petascale computing in astrophysics

5/35

Introduction I Magneto-rotational instability

The magneto-rotational instability (MRI)

What is it? I

MHD instability (Balbus & Hawley 1991)

I

weak B field

I

dr Ω < 0 Fig.: MRI principle

Some dimensionless numbers. . . I

Magnetic intensity: β ∼ (cs /va )2

I

Viscosity: Re = cs H/ν

14/11/2013

M. Joos Petascale computing in astrophysics

5/35

Introduction I Magneto-rotational instability

The magneto-rotational instability (MRI)

What is it? I

MHD instability (Balbus & Hawley 1991)

I

weak B field

I

dr Ω < 0 Fig.: MRI principle

Some dimensionless numbers. . . I

Magnetic intensity: β ∼ (cs /va )2

I

Viscosity: Re = cs H/ν

I

Résistivity: Rm = cs H/η

14/11/2013

M. Joos Petascale computing in astrophysics

5/35

Introduction I Magneto-rotational instability

The magneto-rotational instability (MRI)

What is it? I

MHD instability (Balbus & Hawley 1991)

I

weak B field

I

dr Ω < 0 Fig.: MRI principle

Some dimensionless numbers. . . I

Magnetic intensity: β ∼ (cs /va )2

I

Viscosity: Re = cs H/ν

I

Résistivity: Rm = cs H/η

I

Magnetic Prandtl number: Pm = Rm /Re = ν/η

14/11/2013

M. Joos Petascale computing in astrophysics

5/35

Introduction I Magneto-rotational instability

The magneto-rotational instability (MRI)

What is it? I

MHD instability (Balbus & Hawley 1991)

I

weak B field

I

dr Ω < 0 Fig.: MRI principle

Some dimensionless numbers. . . I

Magnetic intensity: β ∼ (cs /va )2

I

Viscosity: Re = cs H/ν

I

Résistivity: Rm = cs H/η

I

Magnetic Prandtl number: Pm  1 in accretion discs (Balbus & Henri 2008)

14/11/2013

M. Joos Petascale computing in astrophysics

5/35

Introduction I Magneto-rotational instability

The magneto-rotational instability (MRI)

What is it? I

MHD instability (Balbus & Hawley 1991)

I

weak B field

I

dr Ω < 0 Fig.: MRI principle

Evolution of α with Pm ? Re = 400 Re = 800 β = 10 3 : Re = 1600 β = 10 4 ◦: Re = 3200 β = 10 ×: Re = 6400 ∗ Re = 20 000 & β = 103 

2

:

+:

(Lesur & Longaretti 2010)

14/11/2013

M. Joos Petascale computing in astrophysics

5/35

Introduction I Magneto-rotational instability

The magneto-rotational instability (MRI)

What is it? I

MHD instability (Balbus & Hawley 1991)

I

weak B field

I

dr Ω < 0 Fig.: MRI principle

Evolution of α with Pm ? Rm = 2600

0.05 0.04

α

0.03 0.02 0.01 0.00 10 14/11/2013

2

10

1

Pm

100

M. Joos Petascale computing in astrophysics

101 5/35

Outline Introduction Numerical approach IBM BlueGene/Q Numerical methods & initial conditions What challenges? Results Parallel I/O Hybridation GPU

14/11/2013

M. Joos Petascale computing in astrophysics

6/35

Numerical approach I IBM BlueGene/Q

The BlueGene/Q hierarchy Simulation performed on Turing@IDRIS

14/11/2013

M. Joos Petascale computing in astrophysics

7/35

Numerical approach I Numerical methods & initial conditions

Local approach Resolution issue I

turbulent scale: `turb ∼ H

I

few 100’s fluid elements to resolve H

I

H/R ∼0.1

I

∼25 H to cover the radial range



computationally expensive!

14/11/2013

M. Joos Petascale computing in astrophysics

8/35

Numerical approach I Numerical methods & initial conditions

Local approach Resolution issue I

turbulent scale: `turb ∼ H

I

few 100’s fluid elements to resolve H

I

H/R ∼0.1

I

∼25 H to cover the radial range



computationally expensive!

The local approach I

MHD equations

I

dissipation (or not)

I

EOS

14/11/2013

M. Joos Petascale computing in astrophysics

8/35

Numerical approach I Numerical methods & initial conditions

Local approach Resolution issue I

turbulent scale: `turb ∼ H

I

few 100’s fluid elements to resolve H

I

H/R ∼0.1

I

∼25 H to cover the radial range



computationally expensive!

The local approach (1)

∂t ρ + ∇ · (ρv) = 0

(2)

ρ (∂t v + (v · ∇) v) = −∇P + (∇ × B) × B − 2ρΩ × v + 2qρΩ20 xex

(3)

∂t B = ∇ × (v × B) + energy eq. or EOS

14/11/2013

M. Joos Petascale computing in astrophysics

8/35

Numerical approach I Numerical methods & initial conditions

Local approach The shearing box

(a) t = 0

(b) t > 0

Boundary conditions:

14/11/2013

I

Azimuthal direction: periodic

I

Vertical direction: periodic

I

Radial direction: periodic in shearing coordinates M. Joos Petascale computing in astrophysics

8/35

Numerical approach I Numerical methods & initial conditions

Numerical methods

The Ramses code (Teyssier 2002; Fromang et al. 2006) I

Finite volume method: (Godunov’s scheme) n+1/2

∂t u + ∇ · F(u) = 0 ⇒



un+1 = uni − i

∆x

Riemann problem to solve at cells interface

I

upwind scheme: stable if |a∆t/∆x| ≤ 1

I

Constrained transport: using Stokes theorem, the induction eq. becomes Z Z ∂t B · dS + (B × v) · dl = 0 S

14/11/2013

n+1/2

Fi+1/2 − Fi−1/2

L

M. Joos Petascale computing in astrophysics

9/35

∆t

Numerical approach I Numerical methods & initial conditions

Initial conditions

Parameters: I

Resolution: 800×1600×832

I

∼800 000 timesteps (∼9 millions CPU hours, 25 orbits)

I

on 32 768 CPUs (131 072 sub-grids of 25×25×13)

I

Toroidal B

I

Homogeneous ρ

14/11/2013

M. Joos Petascale computing in astrophysics

10/35

Numerical approach I Numerical methods & initial conditions

Initial conditions

Parameters: I

Resolution: 800×1600×832

I

∼800 000 timesteps (∼9 millions CPU hours, 25 orbits)

I

on 32 768 CPUs (131 072 sub-grids of 25×25×13)

I

I

Toroidal B

I

Homogeneous ρ

ideal MHD → Pm ∼ 1

14/11/2013

M. Joos Petascale computing in astrophysics

10/35

Numerical approach I Numerical methods & initial conditions

Initial conditions

Parameters: I

Resolution: 800×1600×832

I

∼800 000 timesteps (∼9 millions CPU hours, 25 orbits)

I

on 32 768 CPUs (131 072 sub-grids of 25×25×13)

I

I

Toroidal B

I

Homogeneous ρ

ideal MHD → Pm ∼ 1

14/11/2013

I

non-ideal MHD

Re = 85 000 & Rm = 2600 ⇒ Pm = 0.03

M. Joos Petascale computing in astrophysics

10/35

Numerical approach I Numerical methods & initial conditions

Initial conditions

Parameters: I

Resolution: 800×1600×832

I

∼800 000 timesteps (∼9 millions CPU hours, 25 orbits)

I

on 32 768 CPUs (131 072 sub-grids of 25×25×13)

I

I

Toroidal B

I

Homogeneous ρ

ideal MHD → Pm ∼ 1

14/11/2013

I

non-ideal MHD

Re = 85 000 & Rm = 2600 ⇒ Pm = 0.03 highest Re ever reach!

M. Joos Petascale computing in astrophysics

10/35

Numerical approach I What challenges?

What challenges at the petascale? 1. Weak scaling

Weak scaling # CPUs 4096 8192 32768

14/11/2013

2th./CPU

telapsed

∼0.58 ∼0.82 ∼0.84

[s]

4th./CPU

telapsed

[s]

∼0.55 ∼0.78 ∼0.80

M. Joos Petascale computing in astrophysics

11/35

Numerical approach I What challenges?

What challenges at the petascale? 1. Weak scaling

Weak scaling # CPUs 4096 8192 32768

2th./CPU

telapsed

[s]

∼0.58 ∼0.82 ∼0.84

I

∼70% efficiency on 32 768 CPUs

I

∼5% faster with 4 threads/CPU

14/11/2013

4th./CPU

telapsed

[s]

∼0.55 ∼0.78 ∼0.80

M. Joos Petascale computing in astrophysics

11/35

Numerical approach I What challenges?

What challenges at the petascale? 1. Weak scaling 2. Parallel I/O

Parallel I/O I

131 072 MPI processes, no hybridation ⇒ if sequential I/O: 131 072 files to write (and read)!

I

different libraries tested, in particular parallel HDF5 and parallel NetCDF

I

more details later!

14/11/2013

M. Joos Petascale computing in astrophysics

11/35

Numerical approach I What challenges?

What challenges at the petascale? 1. Weak scaling 2. Parallel I/O

Parallel I/O I

131 072 MPI processes, no hybridation ⇒ if sequential I/O: 131 072 files to write (and read)!

I

different libraries tested, in particular parallel HDF5 and parallel NetCDF

I

more details later!

I

(note however that GPFS holds on well even with so many files to deal with...)

14/11/2013

M. Joos Petascale computing in astrophysics

11/35

Numerical approach I What challenges?

What challenges at the petascale? 1. Weak scaling 2. Parallel I/O 3. Visualization & data processing

Visualization & data processing How to visualize the data? (200 Go outputs!)

14/11/2013

M. Joos Petascale computing in astrophysics

11/35

Numerical approach I What challenges?

What challenges at the petascale? 1. Weak scaling 2. Parallel I/O 3. Visualization & data processing

Visualization & data processing How to visualize the data? (200 Go outputs!) I

high frequency outputs: sides of the domain → every 3200 timesteps

14/11/2013

M. Joos Petascale computing in astrophysics

11/35

Numerical approach I What challenges?

What challenges at the petascale? 1. Weak scaling 2. Parallel I/O 3. Visualization & data processing

Visualization & data processing How to visualize the data? (200 Go outputs!) I

high frequency outputs: sides of the domain → every 3200 timesteps → fast visualization, 3D movies

14/11/2013

M. Joos Petascale computing in astrophysics

11/35

Numerical approach I What challenges?

What challenges at the petascale? 1. Weak scaling 2. Parallel I/O 3. Visualization & data processing

Visualization & data processing How to visualize the data? (200 Go outputs!) I

high frequency outputs: sides of the domain → every 3200 timesteps → fast visualization, 3D movies

I

low frequency outputs: whole domain → every 32 000 timesteps

14/11/2013

M. Joos Petascale computing in astrophysics

11/35

Numerical approach I What challenges?

What challenges at the petascale? 1. Weak scaling 2. Parallel I/O 3. Visualization & data processing

Visualization & data processing How to visualize the data? (200 Go outputs!) I

high frequency outputs: sides of the domain → every 3200 timesteps → fast visualization, 3D movies

I

low frequency outputs: whole domain → every 32 000 timesteps → science (averages, power spectra. . . )

14/11/2013

M. Joos Petascale computing in astrophysics

11/35

Outline Introduction Numerical approach Results Overview Power spectra Angular momentum transport rate Parallel I/O Hybridation GPU

14/11/2013

M. Joos Petascale computing in astrophysics

12/35

Results I Overview

Overview I

ideal vs. non-ideal MHD: dissipation effects

Ideal MHD – By

14/11/2013

non-ideal MHD – By M. Joos Petascale computing in astrophysics

13/35

Results I Overview

Overview I

ideal vs. non-ideal MHD: dissipation effects

I

kinetic vs. magnetic: dissipation scales

non-ideal MHD – vz

14/11/2013

non-ideal MHD – By

M. Joos Petascale computing in astrophysics

13/35

Results I Power spectra

Power spectra Kinetic & magnetic energies I

Energy

I

14/11/2013

10 10 10 10 10 10 10 10 10

1 2 v(kz )| Ek (kz ) = ρ0 |e 2 2 1 e Emag (kz ) = B(kz ) 8π

1

Ek Em

2 3 4 5 6 7 8 9

100

101

k

102

M. Joos Petascale computing in astrophysics

14/35

Results I Power spectra

Power spectra Kinetic & magnetic energies I

Energy

I

14/11/2013

10 10 10 10 10 10 10 10 10

1 2 v(kz )| Ek (kz ) = ρ0 |e 2 2 1 e Emag (kz ) = B(kz ) 8π

Ek ∝ k

1

3/2

Ek Em

2 3 4 5 6 7 8 9

100

101

k

102

M. Joos Petascale computing in astrophysics

14/35

Results I Angular momentum transport rate

Angular momentum transport rate How to measure the turbulence efficiency? TReynolds TMaxwell

14/11/2013



ρ (vx − v¯x ) vy − v¯y   Bx By = − 4π

=

M. Joos Petascale computing in astrophysics



15/35

Results I Angular momentum transport rate

Angular momentum transport rate How to measure the turbulence efficiency? TReynolds TMaxwell

⇒ α=

14/11/2013



ρ (vx − v¯x ) vy − v¯y   Bx By = − 4π

=



TReynolds + TMaxwell P0

M. Joos Petascale computing in astrophysics

15/35

Results I Angular momentum transport rate

Angular momentum transport rate How to measure the turbulence efficiency? TReynolds TMaxwell



ρ (vx − v¯x ) vy − v¯y   Bx By = − 4π

=

⇒ α=



TReynolds + TMaxwell P0

Rm = 2600

0.05 0.04

α

0.03 0.02 0.01 0.00 10 14/11/2013

2

10

1

100

Pm in astrophysics M. Joos Petascale computing

101 15/35

Results I Angular momentum transport rate

Angular momentum transport rate How to measure the turbulence efficiency? TReynolds TMaxwell



ρ (vx − v¯x ) vy − v¯y   Bx By = − 4π

=

⇒ α=



TReynolds + TMaxwell P0

Rm = 2600

0.05 0.04

α

0.03 0.02 0.01 0.00 10 14/11/2013

2

10

1

100

Pm in astrophysics M. Joos Petascale computing

101 15/35

Outline Introduction Numerical approach Results Parallel I/O Why do we care? Approaches Benchmark Hybridation GPU

14/11/2013

M. Joos Petascale computing in astrophysics

16/35

Parallel I/O I Why do we care?

Why do we care? On-going evolution I

Computing power increases;

I

Number of cores increases rapidely;

I

Memory per core stays constant or decreases; Storing capacity is growing faster than the access speed.

I

14/11/2013

M. Joos Petascale computing in astrophysics

17/35

Parallel I/O I Why do we care?

Why do we care? On-going evolution I

Computing power increases;

I

Number of cores increases rapidely;

I

Memory per core stays constant or decreases; Storing capacity is growing faster than the access speed.

I

Consequences I

Data generated increases with the computing power;

I

More core but less memory: more files!; One file per process approach:

I

I I

I

saturation of filesystems; pre- & post-processing steps heavier;

Time spent in I/O increases.

14/11/2013

M. Joos Petascale computing in astrophysics

17/35

Parallel I/O I Why do we care?

Why do we care? On-going evolution I

Computing power increases;

I

Number of cores increases rapidely;

I

Memory per core stays constant or decreases; Storing capacity is growing faster than the access speed.

I

Consequences I

Data generated increases with the computing power;

I

More core but less memory: more files!; One file per process approach:

I

I I

I

saturation of filesystems; pre- & post-processing steps heavier;

Time spent in I/O increases. ⇒

14/11/2013

Need for parallel I/O with sustainable performance on supercomputers M. Joos Petascale computing in astrophysics

17/35

Parallel I/O I Approaches

Parallel I/O approaches Possible approaches: I

sequential I/O

I

master process distributing data

I

MPI-IO

I

Parallel HDF5

I

Parallel NetCDF

I

ADIOS ...

I

14/11/2013

M. Joos Petascale computing in astrophysics

18/35

Parallel I/O I Approaches

Parallel I/O approaches: POSIX dataCPU 1

dataCPU 2

dataCPU 3

dataCPU n−1

dataCPU n

100101110001010101101000010010111000101010110100001001011100010101011010000100 111000101010110100001001011100010101011010000100101110001010101101000010010111 fileCPU n−1 fileCPU 3 fileCPU 1 fileCPU 2 fileCPU n 010101011010000100101110001010101101000010010111000101010110100001001011100010 101101000010010111000101010110100001001011100010101000101010010101110101101001

14/11/2013

M. Joos Petascale computing in astrophysics

19/35

Parallel I/O I Approaches

Parallel I/O approaches: POSIX

1 2 3 4 5 6 7 8

real(8), dimension(xdim,ydim,zdim,nvar) :: data character(LEN=80) :: filename call get_filename(myrank, 'posix', filename) open(unit=10, file=filename, status='unknown', form='unformatted') write(10) data close(10)

14/11/2013

M. Joos Petascale computing in astrophysics

19/35

Parallel I/O I Approaches

Parallel I/O approaches: MPI-IO dataCPU 1

dataCPU 2

dataCPU 3

dataCPU n−1

dataCPU n

100101110001010101101000010010111000101010110100001001011100010101011010000100 111000101010110100001001011100010101011010000100101110001010101101000010010111 file 010101011010000100101110001010101101000010010111000101010110100001001011100010 101101000010010111000101010110100001001011100010101011010010100001001011100010

14/11/2013

M. Joos Petascale computing in astrophysics

20/35

Parallel I/O I Approaches

Parallel I/O approaches: MPI-IO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

integer :: xpos, ypos, zpos, myrank, i real(8), dimension(xdim,ydim,zdim,nvar) :: data integer, dimension(3) :: boxsize, domdecomp character(LEN=13) :: filename ! MPI variables integer :: fhandle, ierr integer :: int_size, double_size integer(kind=MPI_OFFSET_KIND) :: buf_size integer :: written_arr integer, dimension(3) :: wa_size, wa_subsize, wa_start ! Create MPI array type wa_size = (/ nx*xdim, ny*ydim, nz*zdim /) wa_subsize = (/ xdim, ydim, zdim /) wa_start = (/ xpos, ypos, zpos /)*wa_subsize call MPI_Type_Create_Subarray(3, wa_size, wa_subsize, wa_start & & , MPI_ORDER_FORTRAN, MPI_DOUBLE_PRECISION, written_arr, ierr) call MPI_Type_Commit(written_arr, ierr) call MPI_Type_Size(MPI_INTEGER, int_size, ierr) call MPI_Type_Size(MPI_DOUBLE_PRECISION, double_size, ierr) filename = 'parallelio.mp' ! Open file call MPI_File_Open(MPI_COMM_WORLD, trim(filename) & & , MPI_MODE_WRONLY + MPI_MODE_CREATE, MPI_INFO_NULL, fhandle, ierr) 14/11/2013

M. Joos Petascale computing in astrophysics

20/35

Parallel I/O I Approaches

Parallel I/O approaches: MPI-IO 29 30 31 32 33 34 35 36

! Write data buf_size = 6*int_size + xdim*ydim*zdim*double_size*myrank call MPI_File_Seek(fhandle, buf_size, MPI_SEEK_SET, ierr) call MPI_File_Write_All(fhandle, data(:,:,:,1), xdim*ydim*zdim & , MPI_DOUBLE_PRECISION, MPI_STATUS_IGNORE, ierr) ! Close file call MPI_File_Close(fhandle, ierr)

14/11/2013

M. Joos Petascale computing in astrophysics

20/35

Parallel I/O I Approaches

arallel I/O approaches: Parallel NetCDF PNetCDF: Network Common Data Form →

self-documented, portable format

dataCPU 1

dataCPU 2

dataCPU 3

dataCPU n−1

dataCPU n

dimension: nx = 128; ny = 128; variables: file double array(nx,ny); data: array = ...

14/11/2013

M. Joos Petascale computing in astrophysics

21/35

Parallel I/O I Approaches

Parallel I/O approaches: Parallel NetCDF 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

integer :: xpos, ypos, zpos, myrank real(8), dimension(xdim,ydim,zdim,nvar) :: data character(LEN=13) :: filename ! PnetCDF variables integer(kind=MPI_OFFSET_KIND) :: nxtot, nytot, nztot integer :: nout, ncid, xdimid, ydimid, zdimid, vid1 integer, dimension(3) :: sdimid integer(kind=MPI_OFFSET_KIND), dimension(3) :: dims, start, count integer :: ierr dims = (/ xdim, ydim, zdim /) ! Create file filename = 'parallelio.nc' nout = nfmpi_create(MPI_COMM_WORLD, filename, NF_CLOBBER, MPI_INFO_NULL & , ncid) ! Define dimensions nout = nfmpi_def_dim(ncid, "x", nxtot, xdimid) nout = nfmpi_def_dim(ncid, "y", nytot, ydimid) nout = nfmpi_def_dim(ncid, "z", nztot, zdimid) sdimid = (/ xdimid, ydimid, zdimid /) ! Create variable nout = nfmpi_def_var(ncid, "var1", NF_DOUBLE, 3, sdimid, vid1) ! End of definitions nout = nfmpi_enddef(ncid) 14/11/2013

M. Joos Petascale computing in astrophysics

21/35

Parallel I/O I Approaches

Parallel I/O approaches: Parallel NetCDF 31 32 33 34 35 36 37 38

start = (/ xpos, ypos, zpos /)*dims+1 count = dims ! Write data nout = nfmpi_put_vara_double_all(ncid, vid1, start, count, data(:,:,:,1)) ! Close file nout = nfmpi_close(ncid)

14/11/2013

M. Joos Petascale computing in astrophysics

21/35

Parallel I/O I Approaches

Parallel I/O approaches: Parallel HDF5 HDF5: Hierarchical Data Format → self-documented, hierarchical, portable format dataCPU 1

dataCPU 2

dataCPU 3

dataCPU n−1

dataCPU n

GROUP "/" { DATASET "array" { DATATYPE H5T_IEEE_F64LE DATASPACEfile SIMPLE { (128, 128) / (128, 128) } DATA { ...

14/11/2013

M. Joos Petascale computing in astrophysics

22/35

Parallel I/O I Approaches

Parallel I/O approaches: Parallel HDF5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

integer :: xpos, ypos, zpos, myrank real(8), dimension(xdim,ydim,zdim,nvar) :: data character(LEN=13) :: filename ! HDF5 variables integer :: ierr integer(HID_T) :: integer(HID_T) :: integer(HSIZE_T), integer(HSIZE_T),

file_id, fapl_id, dxpl_id h5_dspace, h5_dset, h5_dspace_file dimension(3) :: start, count, stride, blockSize dimension(3) :: dims, dims_file

! Initialize HDF5 interface call H5open_f(ierr) ! Create HDF5 property IDs for parallel file access filename = 'parallelio.h5' call H5Pcreate_f(H5P_FILE_ACCESS_F, fapl_id, ierr) call H5Pset_fapl_mpio_f(fapl_id, MPI_COMM_WORLD, MPI_INFO_NULL, ierr) call H5Fcreate_f(filename, H5F_ACC_RDWR_F, file_id, ierr & , access_prp=fapl_id) ! Select space in memory and file dims = (/ xdim, ydim, zdim /) dims_file = (/ xdim*nx, ydim*ny, zdim*nz /) call H5Screate_simple_f(3, dims, h5_dspace, ierr) call H5Screate_simple_f(3, dims_file, h5_dspace_file, ierr) 14/11/2013

M. Joos Petascale computing in astrophysics

22/35

Parallel I/O I Approaches

Parallel I/O approaches: Parallel HDF5 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

! Hyperslab for selecting data in h5_dspace start = (/ 0, 0, 0 /) stride = (/ 1, 1, 1 /) count = dims blockSize = (/ 1, 1, 1 /) call H5Sselect_hyperslab_f(h5_dspace, H5S_SELECT_SET_F, start, count & , ierr, stride, blockSize) ! Hyperslab for selecting location in h5_dspace_file (to set the ! correct location in file where we want to put our piece of data) start = (/ xpos, ypos, zpos /)*dims stride = (/ 1,1,1 /) count = dims blockSize = (/ 1,1,1 /) call H5Sselect_hyperslab_f(h5_dspace_file, H5S_SELECT_SET_F, start, count & , ierr, stride, blockSize) ! Enable parallel collective IO call H5Pcreate_f(H5P_DATASET_XFER_F, dxpl_id, ierr) call H5Pset_dxpl_mpio_f(dxpl_id, H5FD_MPIO_COLLECTIVE_F, ierr) ! Create data set call H5Dcreate_f(file_id, trim(dsetname), H5T_NATIVE_DOUBLE & , h5_dspace_file, h5_dset, ierr, H5P_DEFAULT_F, H5P_DEFAULT_F & , H5P_DEFAULT_F)

14/11/2013

M. Joos Petascale computing in astrophysics

22/35

Parallel I/O I Approaches

Parallel I/O approaches: Parallel HDF5 55 56 57 58 59 60 61 62 63 64 65 66 67

! Finally write data to file call H5Dwrite_f(h5_dset, H5T_NATIVE_DOUBLE, data, dims, ierr & , mem_space_id=h5_dspace, file_space_id=h5_dspace_file & , xfer_prp=dxpl_id) ! Clean HDF5 IDs call H5Pclose_f(dxpl_id, ierr) call H5Dclose_f(h5_dset, ierr) call H5Sclose_f(h5_dspace, ierr) call H5Sclose_f(h5_dspace_file, ierr) call H5Fclose_f(file_id, ierr) call H5Pclose_f(fapl_id, ierr) call H5close_f(ierr)

14/11/2013

M. Joos Petascale computing in astrophysics

22/35

Parallel I/O I Approaches

Parallel I/O approaches: ADIOS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

integer :: xpos, ypos, zpos, myrank real(8), dimension(xdim,ydim,zdim,nvar) :: data character(LEN=17) :: filename ! MPI & ADIOS variables integer :: adios_err integer(8) :: adios_handle, offset_x, offset_y, offset_z integer :: xdimglob, ydimglob, zdimglob integer :: ierr ! Init ADIOS call ADIOS_Init("adios_BRIO.xml", MPI_COMM_WORLD, ierr) ! Define offset_x offset_y offset_z xdimglob

offset and global dimensions = xdim*xpos = ydim*ypos = zdim*zpos = xdim*nx; ydimglob = ydim*ny; zdimglob = zdim*nz

! Open ADIOS file & write data call ADIOS_Open(adios_handle, "dump", "parallelio_XML.bp", "w" & & , MPI_COMM_WORLD, ierr) ! Write I/O # include "gwrite_dump.fh" ! Close ADIOS file and interface call ADIOS_Close(adios_handle, ierr) call ADIOS_Finalize(myrank, ierr) 14/11/2013

M. Joos Petascale computing in astrophysics

23/35

Parallel I/O I Approaches

I/O approaches: ADIOS Pwitharallel the following XML file for definitions: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15



14/11/2013

M. Joos Petascale computing in astrophysics

23/35

Parallel I/O I Approaches

Parallel I/O approaches Possible approaches: I

sequential I/O

I

master process distributing data

I I

MPI-IO Parallel HDF5

I

Parallel NetCDF

I

ADIOS

I

...

library — MPI-IO PHDF5 PnetCDF ADIOS

14/11/2013

M. Joos Petascale computing in astrophysics

24/35

Parallel I/O I Approaches

Parallel I/O approaches Possible approaches: I

sequential I/O

I

master process distributing data

I I

MPI-IO Parallel HDF5

I

Parallel NetCDF

I

ADIOS

I

...

library — MPI-IO PHDF5 PnetCDF ADIOS

14/11/2013

ease of use X – X – XX M. Joos Petascale computing in astrophysics

24/35

Parallel I/O I Approaches

Parallel I/O approaches Possible approaches: I

sequential I/O

I

master process distributing data

I I

MPI-IO Parallel HDF5

I

Parallel NetCDF

I

ADIOS

I

...

library — MPI-IO PHDF5 PnetCDF ADIOS

14/11/2013

ease of use 1 file X – X – XX

X X X X X M. Joos Petascale computing in astrophysics

24/35

Parallel I/O I Approaches

Parallel I/O approaches Possible approaches: I

sequential I/O

I

master process distributing data

I I

MPI-IO Parallel HDF5

I

Parallel NetCDF

I

ADIOS

I

...

library — MPI-IO PHDF5 PnetCDF ADIOS

14/11/2013

ease of use 1 file portability X – X – XX

X X X X X

X X X X X

M. Joos Petascale computing in astrophysics

24/35

Parallel I/O I Approaches

Parallel I/O approaches Possible approaches: I

sequential I/O

I

master process distributing data

I I

MPI-IO Parallel HDF5

I

Parallel NetCDF

I

ADIOS

I

...

library — MPI-IO PHDF5 PnetCDF ADIOS

14/11/2013

ease of use 1 file portability X – X – XX

X X X X X

X X X X X

self-documented X X X X X

M. Joos Petascale computing in astrophysics

24/35

Parallel I/O I Approaches

Parallel I/O approaches Possible approaches: I

sequential I/O

I

master process distributing data

I I

MPI-IO Parallel HDF5

I

Parallel NetCDF

I

ADIOS

I

...

library — MPI-IO PHDF5 PnetCDF ADIOS

14/11/2013

ease of use 1 file portability X – X – XX

X X X X X

X X X X X

self-documented flexibility X X X X X

M. Joos Petascale computing in astrophysics

X – X X XX 24/35

Parallel I/O I Approaches

Parallel I/O approaches Possible approaches: I

sequential I/O

I

master process distributing data

I I

MPI-IO Parallel HDF5

I

Parallel NetCDF

I

ADIOS

I

...

library — MPI-IO PHDF5 PnetCDF ADIOS

14/11/2013

ease of use 1 file portability X – X – XX

X X X X X

X X X X X

self-documented flexibility interface X X X X X

M. Joos Petascale computing in astrophysics

X – X X XX 24/35

X X X X –

Parallel I/O I Benchmark

BRIO: a benchmark for parallel I/O Tested libraries: I

sequential I/O

I

MPI-IO

I

parallel HDF5

I

parallel NetCDF ADIOS

I

What does it do? I

write/read data distributed on a cartesian grid

I

compute writing/reading time few parameters:

I

I I I I

I

size of the grid domain decomposition contiguity of data XML/noXML interface (for ADIOS only)

under GNU/GPL license, available at https://bitbucket.org/mjoos

14/11/2013

M. Joos Petascale computing in astrophysics

25/35

Parallel I/O I Benchmark

Results on Turing (BG/Q) # MPI threads

library

contiguous

twriting [s]

4096

— HDF5 parallel HDF5

— — X X X

9.602 9.337 29.226 8.394 5.941

— X X X

12.129 109.419 12.165 9.557

— X

100.197 47.592

parallel NetCDF

16 384

— parallel HDF5 parallel NetCDF

131 072

14/11/2013

— parallel NetCDF

M. Joos Petascale computing in astrophysics

26/35

Parallel I/O I Benchmark

Results on Turing (BG/Q) # MPI threads

library

contiguous

twriting [s]

4096

— HDF5 parallel HDF5

— — X X X

9.602 9.337 29.226 8.394 5.941

— X X X

12.129 109.419 12.165 9.557

— X

100.197 47.592

parallel NetCDF

16 384

— parallel HDF5 parallel NetCDF

131 072

14/11/2013

— parallel NetCDF

M. Joos Petascale computing in astrophysics

26/35

Parallel I/O I Benchmark

Results on Turing (BG/Q) # MPI threads

library

contiguous

twriting [s]

4096

— HDF5 parallel HDF5

— — X X X

9.602 9.337 29.226 8.394 5.941

— X X X

12.129 109.419 12.165 9.557

— X

100.197 47.592

parallel NetCDF

16 384

— parallel HDF5 parallel NetCDF

131 072

14/11/2013

— parallel NetCDF

M. Joos Petascale computing in astrophysics

26/35

Outline Introduction Numerical approach Results Parallel I/O Hybridation Why hybridize codes? Hybridation of Ramses Auto-parallelization GPU

14/11/2013

M. Joos Petascale computing in astrophysics

27/35

Hybridation I Why hybridize codes?

Why hybridize codes? Advantages I

better match of modern architectures: interconnected nodes with shared memory

14/11/2013

M. Joos Petascale computing in astrophysics

28/35

Hybridation I Why hybridize codes?

Why hybridize codes? Advantages I

I

better match of modern architectures: interconnected nodes with shared memory optimized memory usage: I I

14/11/2013

less data duplicated by MPI processes lower memory footprint

M. Joos Petascale computing in astrophysics

28/35

Hybridation I Why hybridize codes?

Why hybridize codes? Pure MPI:

14/11/2013

M. Joos Petascale computing in astrophysics

28/35

Hybridation I Why hybridize codes?

Why hybridize codes? data

Pure MPI:

14/11/2013

M. Joos Petascale computing in astrophysics

28/35

Hybridation I Why hybridize codes?

Why hybridize codes? data

boundaries

Pure MPI:

14/11/2013

M. Joos Petascale computing in astrophysics

28/35

Hybridation I Why hybridize codes?

Why hybridize codes? data

boundaries

Pure MPI:

ghost zones

14/11/2013

M. Joos Petascale computing in astrophysics

28/35

Hybridation I Why hybridize codes?

Why hybridize codes? data

boundaries

Pure MPI:

ghost zones

MPI+OpenMP:

14/11/2013

M. Joos Petascale computing in astrophysics

28/35

Hybridation I Why hybridize codes?

Why hybridize codes? Pure MPI

MPI+OpenMP

Ex: 800×1600×832 domains, 11 variables (double precision) # MPI processes

# OpenMP threads

Size (in Go)

131 072 16 384

1 8

197 135

memory gain: >30%

14/11/2013

M. Joos Petascale computing in astrophysics

28/35

Hybridation I Why hybridize codes?

Why hybridize codes? Advantages I

I

better match of modern architectures: interconnected nodes with shared memory optimized memory usage: I I

I

less data duplicated by MPI processes lower memory footprint

better I/O performances: I I I

14/11/2013

less simultaneous access less operations with bigger datasets less files (without parallel I/O)

M. Joos Petascale computing in astrophysics

28/35

Hybridation I Why hybridize codes?

Why hybridize codes? Advantages I

I

better match of modern architectures: interconnected nodes with shared memory optimized memory usage: I I

I

better I/O performances: I I I

I

less data duplicated by MPI processes lower memory footprint less simultaneous access less operations with bigger datasets less files (without parallel I/O)

better granularity: I I I

14/11/2013

MPI program: compute and communicate granularity: ratio between computing and communication steps the larger the granularity, the better the extensivity

M. Joos Petascale computing in astrophysics

28/35

Hybridation I Hybridation of Ramses

Hybridation of Ramses Main steps: 1. reorganize the code →

from 42 to 10 source files, reorganization of the modules etc.

2. Fine-Grain approach: parallelize external loops 3. MPI communications

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

integer, parameter :: nx=512, ny=512, nz=1024 real(8), dimension(:,:,:), allocatable :: array integer :: i, j, k allocate(array(nx,ny,nz)) do k = 1, nz do j = 1, ny do i = 1, nx array(i,j,k) = (k*j + i)*1. enddo enddo enddo deallocate(array)

14/11/2013

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

integer, parameter :: nx=512, ny=512, nz=1024 real(8), dimension(:,:,:), allocatable :: array integer :: i, j, k allocate(array(nx,ny,nz)) !$OMP PARALLEL !$OMP DO SCHEDULE(RUNTIME) do k = 1, nz do j = 1, ny do i = 1, nx array(i,j,k) = (k*j + i)*1. enddo enddo enddo !$OMP END DO !$OMP END PARALLEL deallocate(array)

M. Joos Petascale computing in astrophysics

29/35

Hybridation I Hybridation of Ramses

Hybridation of Ramses Preliminary results: Poincaré@Maison de la Simulation (16 cores per node, Intel Sandy Bridge): # MPI proc. 16 8 4 2

# OpenMP threads 1 2 4 8

telapsed [s] 17.49 17.04 16.85 20.07

Turing@IDRIS (1024 cores per node, PowerPC A2): # MPI proc. 2048 1024 512

14/11/2013

# OpenMP threads 1 2 4

telapsed [s] 355.9 337.9 366.1

M. Joos Petascale computing in astrophysics

29/35

Hybridation I Auto-parallelization

Auto-parallelization: does it worth it? What is it? I

compilers can detect serial portions of the code that can be multithreaded → equivalent to an OpenMP parallelization

How to use it? compiler

option

Intel IBM PGI

-parallel -qsmp=auto -Mconcur

How to help your compiler? compiler Intel IBM PGI

14/11/2013

pragma parallel — concur[/noconcur]

M. Joos Petascale computing in astrophysics

30/35

Hybridation I Auto-parallelization

Auto-parallelization: does it worth it? Results:

Simple example:

Intel compiler: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

integer, parameter :: nx=512, ny=512, nz=1024 real(8), dimension(:,:,:), allocatable :: array integer :: i, j, k allocate(array(nx,ny,nz)) !$OMP PARALLEL !$OMP DO SCHEDULE(RUNTIME) do k = 1, nz do j = 1, ny do i = 1, nx array(i,j,k) = (k*j + i)*1. enddo enddo enddo !$OMP END DO !$OMP END PARALLEL deallocate(array)

14/11/2013

# threads

tauto [s]

tOpenMP [s]

1 2 4 8

0.5174 0.2454 0.1302 0.07820

0.5187 0.2383 0.1272 0.06650

# threads

tauto [s]

tOpenMP [s]

1 2 4 8

0.6510 0.3092 0.1559 0.08200

1.049 0.5067 0.2580 0.1332

PGI compiler:

M. Joos Petascale computing in astrophysics

31/35

Hybridation I Auto-parallelization

Auto-parallelization: does it worth it? Intermediate example:

Results: Intel compiler:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

!$omp parallel shared(a,anew,error,iter) do error = 0.d0 !$omp do reduction(max:error) schedule(runtime) do j = 2, m-1 do i = 2, n-1 anew(i,j) = 0.25*(a(i,j+1) + a(i,j-1) & + a(i-1,j) + a(i+1,j)) error = max(error,abs(anew(i,j)-a(i,j))) enddo enddo !$omp do schedule(runtime) do j = 2, m-1 do i = 2, n-1 a(i,j) = anew(i,j) enddo enddo if((error .lt. tolerance) .or. & (iter-1 .gt. iter_max)) exit !$omp single if(mod(iter,10).eq.0) print*, iter, error iter = iter + 1 !$omp end single enddo !$omp end parallel

14/11/2013

# threads

tauto [s]

tOpenMP [s]

1 2 4 8 16

0.0627 0.0593 0.0518 0.0581 0.0796

0.0856 0.0469 0.0251 0.0154 0.0122

# threads

tauto [s]

tOpenMP [s]

1 2 4 8 16

0.175 0.101 0.0643 0.0440 0.0344

0.194 0.0967 0.0509 0.0285 0.0177

PGI compiler:

M. Joos Petascale computing in astrophysics

31/35

Hybridation I Auto-parallelization

Auto-parallelization: does it worth it? “Real” example: Tested on Ramses: I

Intel compiler, with 4 MPI processes: # threads 1 2 4

14/11/2013

tauto [s] 656.5 540.5 491.4

tOpenMP [s] 777.6 424.2 246.4

M. Joos Petascale computing in astrophysics

31/35

Hybridation I Auto-parallelization

Auto-parallelization: does it worth it? “Real” example: Tested on Ramses: I

Intel compiler, with 4 MPI processes: # threads 1 2 4

I

tauto [s] 656.5 540.5 491.4

tOpenMP [s] 777.6 424.2 246.4

IBM compiler, with 1024 MPI processes: # threads 1 2 4

14/11/2013

tauto [s] 587.7 340.3 218.5

tOpenMP [s] 582.3 337.9 216.0

M. Joos Petascale computing in astrophysics

31/35

Outline Introduction Numerical approach Results Parallel I/O Hybridation GPU Why do we want GPUs? OpenACC

14/11/2013

M. Joos Petascale computing in astrophysics

32/35

GPU I Why do we want GPUs?

Why do want to do astrophysics on GPUs? Pro: X sheer computing power

14/11/2013

hardware

processing power [GFLOPS]

Intel Sandy Bridge (single core) Intel Sandy Bridge (whole chip) NVIDIA Tesla Kepler 20 (SP) NVIDIA Tesla Kepler 20 (DP)

24.6 157.7 4106 1173

M. Joos Petascale computing in astrophysics

33/35

GPU I Why do we want GPUs?

Why do want to do astrophysics on GPUs? Pro: X sheer computing power X weak scaling on Ramses (Ramses-GPU doc., P. Kestener) # MPI proc.

Global size

perfCPU [update/s]

perfGPU [update/s]

1 8 64 128 256

128×128×128 256×256×256 512×512×512 1024×512×512 1024×1024×512

0.21 1.68 13.4 26.8 52.5

13.6 95.3 750.3 1498.3 2969.3

14/11/2013

M. Joos Petascale computing in astrophysics

33/35

GPU I Why do we want GPUs?

Why do want to do astrophysics on GPUs? Pro: X sheer computing power X weak scaling on Ramses (Ramses-GPU doc., P. Kestener) # MPI proc.

Global size

perfCPU [update/s]

perfGPU [update/s]

1 8 64 128 256

128×128×128 256×256×256 512×512×512 1024×512×512 1024×1024×512

0.21 1.68 13.4 26.8 52.5

13.6 95.3 750.3 1498.3 2969.3

Cons: X CUDA is a C library → need to translate scientific codes in C/C++

14/11/2013

M. Joos Petascale computing in astrophysics

33/35

GPU I Why do we want GPUs?

Why do want to do astrophysics on GPUs? Pro: X sheer computing power X weak scaling on Ramses (Ramses-GPU doc., P. Kestener) # MPI proc.

Global size

perfCPU [update/s]

perfGPU [update/s]

1 8 64 128 256

128×128×128 256×256×256 512×512×512 1024×512×512 1024×1024×512

0.21 1.68 13.4 26.8 52.5

13.6 95.3 750.3 1498.3 2969.3

Cons: X CUDA is a C library → need to translate scientific codes in C/C++ X CUDA is not memory management-friendly → need to rethink the algorithms

14/11/2013

M. Joos Petascale computing in astrophysics

33/35

GPU I Why do we want GPUs?

Why do want to do astrophysics on GPUs? Pro: X sheer computing power X weak scaling on Ramses (Ramses-GPU doc., P. Kestener) # MPI proc.

Global size

perfCPU [update/s]

perfGPU [update/s]

1 8 64 128 256

128×128×128 256×256×256 512×512×512 1024×512×512 1024×1024×512

0.21 1.68 13.4 26.8 52.5

13.6 95.3 750.3 1498.3 2969.3

Cons: X CUDA is a C library → need to translate scientific codes in C/C++ X CUDA is not memory management-friendly → need to rethink the algorithms ⇒ 14/11/2013

1.5 year to recode Ramses

M. Joos Petascale computing in astrophysics

33/35

GPU I OpenACC

OpenACC: the way to go? What is it? I

Compiler directives to specify loops and regions to offload from CPU to accelerator

I

no need to explicitely manage data

I

Fortran friendly!

14/11/2013

M. Joos Petascale computing in astrophysics

34/35

GPU I OpenACC

OpenACC: the way to go? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

tol = 1.d-6 iter_max = 1000 do while ( error .gt. tol .and. iter .lt. iter_max ) error=0.d0 do j = 1, m-2 do i = 1, n-2 Anew(i,j) = 0.25 *(A(i+1,j) + A(i-1,j) & + A(i,j-1) + A(i,j+1)) error = max( error, abs(Anew(i,j) - A(i,j))) end do end do

I

tCPU = 0.177 s

if(mod(iter,100).eq.0) print*, iter, error iter = iter + 1 do j = 1, m-2 do i = 1, n-2 A(i,j) = Anew(i,j) end do end do end do

14/11/2013

M. Joos Petascale computing in astrophysics

34/35

GPU I OpenACC

OpenACC: the way to go? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

tol = 1.d-6 iter_max = 1000 do while ( error .gt. tol .and. iter .lt. iter_max ) error=0.d0 !$acc kernels loop do j = 1, m-2 do i = 1, n-2 Anew(i,j) = 0.25 *(A(i+1,j) + A(i-1,j) & + A(i,j-1) + A(i,j+1)) error = max( error, abs(Anew(i,j) - A(i,j))) end do end do if(mod(iter,100).eq.0) print*, iter, error iter = iter + 1

I

tCPU = 0.177 s

I

tGPU = 0.149 s

!$acc kernels loop do j = 1, m-2 do i = 1, n-2 A(i,j) = Anew(i,j) end do end do end do

14/11/2013

M. Joos Petascale computing in astrophysics

34/35

GPU I OpenACC

OpenACC: the way to go? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

tol = 1.d-6 iter_max = 1000 !$acc data copy(A) create(Anew) do while ( error .gt. tol .and. iter .lt. iter_max ) error=0.d0 !$acc kernels loop do j = 1, m-2 do i = 1, n-2 Anew(i,j) = 0.25 *(A(i+1,j) + A(i-1,j) & + A(i,j-1) + A(i,j+1)) error = max( error, abs(Anew(i,j) - A(i,j))) end do end do if(mod(iter,100).eq.0) print*, iter, error iter = iter + 1

I

tCPU = 0.177 s

I

tGPU = 0.149 s tGPU data = 0.00667 s

I

!$acc kernels loop do j = 1, m-2 do i = 1, n-2 A(i,j) = Anew(i,j) end do end do end do !$acc end data 14/11/2013

M. Joos Petascale computing in astrophysics

34/35

Conclusions & prospects Physics: 1. Asymptotic convergence of α at low Pm 2. Hydrodynamics cascade at small scales 3. Simulation still running on Turing to confirm the result

14/11/2013

M. Joos Petascale computing in astrophysics

35/35

Conclusions & prospects Physics: 1. Asymptotic convergence of α at low Pm 2. Hydrodynamics cascade at small scales 3. Simulation still running on Turing to confirm the result

Numerics: 1. parallel I/O are mature enough to be used: I I I

(very) good performance more and more easy to use unavoidable to go to exascale

2. need hybridation (OpenMP/OpenACC + MPI) to go to keep going with the increasing computational power 3. GPU are not so hard to use 4. Don’t worry! your compilers become smarter and smarter

14/11/2013

M. Joos Petascale computing in astrophysics

35/35

Conclusions & prospects Physics: 1. Asymptotic convergence of α at low Pm 2. Hydrodynamics cascade at small scales 3. Simulation still running on Turing to confirm the result

Numerics: 1. parallel I/O are mature enough to be used: I I I

(very) good performance more and more easy to use unavoidable to go to exascale

2. need hybridation (OpenMP/OpenACC + MPI) to go to keep going with the increasing computational power 3. GPU are not so hard to use 4. Don’t worry! your compilers become smarter and smarter Thank you for your attention! 14/11/2013

M. Joos Petascale computing in astrophysics

35/35