Unified Parallel C hands on LO47

Mar 6, 2013 - WinSCP can be downloaded at : http://winscp.net/eng/index.php. ... Once the sge file modified to suit your needs, submit the job with the ...
156KB taille 61 téléchargements 378 vues
Unified Parallel C hands on LO47 March 6, 2013

1

Ressources • UPC website (documentation, download, news, FAQ, ...): http://upc.gwu.edu • PGAS website : http://pgas.org • The NAS Parallel Benchmark : http://threads.hpcl.gwu.edu/sites/npb-upc • Textbook : T.El-Ghazawi et al., UPC:DSM programming, Wiley ed., 2005 • Acknowledgment : special thanks to Prof. T.El-Ghazawi and O.Serres, from George Washington University, for providing us with the slides and the tutorial used in this course L047 at UTBM.

2

Getting started

2.1

Login and file transfer instructions

your userID is tputbm, the password will be communicated at the first session. 2.1.1

Using Linux or MacOS X

You can use ssh to access your account: $ ssh tputbm @mesoshared.univ-fcomte.fr you have a common account, so you need to create your own work folder : $ mkdir jgaber scp can be used to transfer files to your own folder : $ scp -r exercises.tar.gz tputbm @mesoshared.univ-fcomte.fr:jgaber/ It is recommanded to unpack the code in the work folder; • cd jgaber • tar zxvf ../exercises.tar.gz

1

2.1.2

Using Windows

Windows users can use the PuTTY client or NX Client for Windows. To transfer files, it is possible to use the WinSCP client. WinSCP can be downloaded at : http://winscp.net/eng/index.php.

2.2

UPC Hello World

We will start by the basic ”Hello world” example program and then with other programs from UPC-Manual-1.2. Review the code in the file ex 1.upc. 1 2

#include #include < s t d i o . h>

3 4 5

i n t main ( i n t a r g c , char ∗∗ a r g v ) { p r i n t f ( " T h r e a d % d o f % d : H e l l o W o r l d \ n " , MYTHREAD, THREADS ) ;

6

return 0 ;

7 8

}

To compile this example, you first need to load the Berkeley compiler module : • module load upc Use the provided Makefile to compile the code : • make ex 1 cc -c -h upc -X 24 ex 1.upc -o ex 1.o cc -h upc -X 24 ex 1.o -o ex 1 rm ex 1.o Submit a job to run the program. An example job submission script is provided. It may need to be modified. Especially, line 6 needs to match your attributed budget. Once the sge file modified to suit your needs, submit the job with the command : • qsub ex 1.sge 146456.sdb 146461 is your job id. Your job status can be tracked with the qstat command : • qstat 146461.sdb Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----146461.sdb job upc ex1 oserres 0 Q parallel Once your job is completed, you will find two files, they contain the error output and the standard output respectively : job upc ex1.e146461 and job upc ex1.o146461. In the above program, each thread prints out the following message: “Thread p of P: Hello world”, where p is the thread number (MYTHREAD) and P is the total number of threads (THREADS). Question: Is your output ordered according to the thread numbers ? -q -[no]quiet Suppress initialization messages from UPC runtime.

2

Compiling environments In UPC there are two ways to compile code. In the ”dynamic threads” environment does not specific a value for THREADS at compile time, deferring that choice until run time. In the ”static threads” environment, the value of THREADS is determined at compile time. As indicated in the error messages you quote, there are restrictions on static array declarations in the dynamic threads environment. The solution is simply to specify the thread count at compile time. The flags to specify a thread count at compile time differ among UPC compilers, but for the Berkeley UPC compiler you want something like ”-T 4” on the upcc command line (for THREADS=4 in this case).

2.3

Work sharing, synchronization

Review the code of printing a conversion table and implement an efficient version that distribute independent work across the threads using upc forall and upc barrier

2.4

shared arrays, blocked shared arrays

Review the code of vector addition and implement it. Question: Is the default distribution yields to an efficient implementation in this case ? why ? Review the code of matrix-vector multiplication and implement it. Question: Is the default distribution yields to an efficient implementation in this case ? why ? Question: Is the array elements are blocked will improve your program ? why ? give the resulting layout.

3

Simplified 1D Laplace solver

3.1

A simplified 1-D Laplace solver in C

Consider a hypothetical 1-D Laplace solver program in C. The output of the current position is a function of the values of the neighbor positions and the value of current position of array b. The algorithm is demonstrated in the following figure:

Figure 1: 1D Laplace Solver Algorithm And the equation is: xnew = 0.5 ∗ (xj−1 + xj+1 + bj )

3

For simplicity, in this exercise we only consider a single iteration. Normally, a large number of iterations is used till the solution converges. Multiple iterations will be considered in subsequent exercises. An example 1-D Laplace solver in sequential C is given in ex 2.c: 1 2 3

#include < s t d i o . h> #include < s t d l i b . h> #include

4 5

#d ef i n e TOTALSIZE 800

6 7

void i n i t ( ) ;

8 9 10 11

double x new [ TOTALSIZE ] ; double x [ TOTALSIZE ] ; double b [ TOTALSIZE ] ;

12 13 14

i n t main ( i n t a r g c , char ∗∗ a r g v ) { int j ;

15

init ();

16 17

19

f o r ( j =1; j s e t u p j t o p o i n t t o t h e // p r o g r e s s ( i n r e s p e c t t o i t s ...

20 21 22

f i r s t e l e m e nt so t h a t t h e c u r r e n t t h r e a d s h o u l d affinity )

23

// ==> add a f o r l o o p which g o e s o n l y t h r o u g h t h e e l e m e n t s i n t h e x new a r r a y // w i t h a f f i n i t y t o t h e c u r r e n t THREAD

24 25 26

for ( j . . . ; . . . ; . . . ) x new [ j ] = 0 . 5 ∗ ( x [ j −1] + x [ j +1] + b [ j ]

27 28

);

29

upc barrier ;

30 31

i f ( MYTHREAD == 0 ) { printf (" b | x | x_new \n" ) ; p r i n t f ( " =============================\n" ) ;

32 33 34 35

f o r ( j =0; j

5 6

#d ef i n e BLOCKSIZE 16

7 8 9 10 11 12

//==> d e c l a r e t h e x , x new and b a r r a y s i n t h e s h a r e d s p a c e w i t h s i z e o f // BLOCKSIZE∗THREADS and w i t h b l o c k i n g s i z e o f BLOCKSIZE s h a r e d [ . . . ] double x [ . . . ] ; s h a r e d [ . . . ] double x new [ . . . ] ; s h a r e d [ . . . ] double b [ . . . ] ;

13 14

void i n i t ( ) ;

15 16 17

i n t main ( i n t a r g c , char ∗∗ a r g v ) { int j ;

18

init (); upc barrier ; //==> i n s e r t a u p c f o r a l l s t a t e m e n t t o do work s h a r i n g // r e s p e c t i n g t h e a f f i n i t y o f t h e x new a r r a y u p c f o r a l l ( j = . . . ; j < ( . . . ) − 1 ; j ++; . . . ) { x new [ j ] = 0 . 5 ∗ ( x [ j −1] + x [ j +1] + b [ j ] ) ; } upc barrier ;

19 20 21 22 23 24 25 26

while

27

i f ( MYTHREAD == 0 ) { printf (" b | x | x_new \n" ) ; p r i n t f ( " =============================\n" ) ;

28 29 30 31

f o r ( j =0; j

7

4 5

#include #d ef i n e BLOCKSIZE 16

6 7 8 9

shared shared shared

[ BLOCKSIZE] double x [ BLOCKSIZE∗THREADS ] ; [ BLOCKSIZE] double x new [ BLOCKSIZE∗THREADS ] ; [ BLOCKSIZE] double b [ BLOCKSIZE∗THREADS ] ;

10 11

void i n i t ( ) ;

12 13 14 15

i n t main ( i n t a r g c , char ∗∗ a r g v ) { int j ; int i t e r ;

16

init (); upc barrier ;

17 18 19

// add two b a r r i e r s t a t e m e n t s , t o e ns u r e a l l t h r e a d s f i n i s h e d computing // x new [ ] and t o e ns u r e t h a t a l l t h r e a d s have c o m p l e t e d t h e a r r a y // swapping . f o r ( i t e r =0; i t e r