Processor computing : OpenCL One specification to rule them all, One specification to find them, One specification to bring them all and in the performance bind them
Table of content 1 How to increase processor performance ?.........................................................................................2 Frequency.........................................................................................................................................2 Paralellism.......................................................................................................................................2 ILP...............................................................................................................................................2 TLP..............................................................................................................................................2 devices comparison.....................................................................................................................3 2 TLP software access : State of the art...............................................................................................4 CPU only solutions..........................................................................................................................4 Nvidia GPU only..............................................................................................................................4 All device access in one API..........................................................................................................4 3 OpenCL.............................................................................................................................................5 Implementations...............................................................................................................................5 Platform model................................................................................................................................5 Execution model..............................................................................................................................5 Memory model.................................................................................................................................5 Programming model........................................................................................................................5
1 How to increase processor performance ? Frequency The easier way to increase performance was, for a long time, to reduce transistor size and increase processor frequency so they can process more instruction per second. But in 2005 intel give up on this strategy because they have reach the limit of their technology due to transistor leakage and power density. So frequency increasing is no more a solution, because we can produce processor that reach this limit.
Paralellism The other way to increase performance is to give the processor the potential to process multiple instruction at a time. Two types of parallelism exists : – instruction level parallelism (ILP); – thread level parallelism (TLP); ILP
ILP has been expressed with optimization like instruction pipelining, and new SIMD (single instruction multiple data) instruction set like SSE. But ILP optimization complexify processor architecture (that are already very complex), so this a hard way to increase performance. TLP
TLP can be achieved by using multiple processor, adding core to processors, or using optimization like hyperthreading. GPU manufacturers have always design their processors whith this strategy in mind (more thant increasing frequency), because processing vertices or fragments are massively parallel algorithms. Their cores have been programmable in 2001 (nv20 & r200) and have been unified in 2006 (g80 & r600), even if they always contains some specialized units. Now they are juste like other muti core processors. Nowadays TLP is became the easier way to increase performance, and so all processors will be composed of more and more cores. Developpers have to use TLP if they want to benefit from performance increase.
Year 2000
2005
2010
Name
Frequency
Circuit size
Consuption
Transistor count
115 Watts
168 millions
Performance
intel pentium 4
1.5 GHz
180 nm
ati radeon ddr
180 MHz
180 nm
nvidia geforce 2
250 MHz
180 nm
intel pentium 4 670
3,8 GHz
90/65 nm
ati radeon x1800
625 MHz
90 n m
nvidia geforce 7800
550 MHz
intel core i7
3,3 GHz
45/32 nm
130 Watts
731 millions
70 Gflops (double)
ati hd 5870
850 MHz
40 nm
188 Watts
2154 millions
2720 Gflops
Nvidia gtx 480
700 MHz
40 nm
300 Watts
3200 millions
1400 Gflops
320 millions 302 millions
devices comparison
2 TLP software access : State of the art CPU only solutions The first solution (1995) was the posix threads API, wich defined way to access processor threads. Most of the posix operating system implements posix threads. They provide thread creation, thread join, mutexes, condition variables, barriers. Some solution were created to simplify threads usage. For example : OpenMP (1997) and intel Threading Building Blocks (2006). These solutions integrate the parallel code with the rest of the programm.
Nvidia GPU only It has been possible to access GPU computing power with OpenGL and Direct3D, for a long time, but they API are not oriented to generic computing. In 2007 nvidia provide the first easy way to access computing power of their GPU : CUDA.
All device access in one API But with all theese existing solutions developpers are a bit lost, they really need unification, that is OpenCL purpose. OpenCL was initially developped by Apple, and submited to Khronos Group (warden of OpenGL api, collada schema, and some other open specifications). OpenCL specification has been approved for public release on December 2008.
3 OpenCL Implementations In 2010 there is several OpenCL implementors : amd, apple, ibm, nvidia, .... So we can use OpenCL under linux, mac os X snow leopard, windows on GPU
Platform model Execution model Memory model Programming model