Thread and Worker Threads Multithread Parallelism ... - Index64 .fr

Oct 9, 2014 - operations at the same time. Multithread merely underlines the existence of multiple threads inside the application, whereas Parallelism ...
316KB taille 3 téléchargements 178 vues
Why Does Concurrency Mean Performance - in most of the cases - ?

With the increasing number of cores, applications will be forced to use many threads to extract the power out of them. This short article tries to clarify the main terms used in parallel programming such as multithread, parallel and concurrent run. It also highlights the benefits of concurrent programming in the data-management area by focusing on Index64, a new concurrent key-value store.

Main Thread and Worker Threads A thread is a path of execution. In a single or mono thread application, there is only one path of execution. Since the advent of graphical user interface in the 80’s, the applications have become multithreaded. The execution path which is responsible of the user interface is called the main thread. Upon each user’s request, the main thread starts a worker thread to handle the requested task in background. When a background task is finished, the user interface displays a signal informing the user that a result is now available. Usually the main thread only dispatches the works to the other threads. The other threads deal with relatively long tasks. They are called the worker threads. An example of that is the browser: you can google an expression and open the answers in several tabs. Tabs are handled by threads. The Windows task manager gives a way to observe the number of running threads. Open the processus tab of the task manager, then from the menu “select the column to display” choose the column named thread. The list in the task manager shows the number of threads, application by application.

Multithread Parallelism and Synchronization Multithread and Parallelism are general words expressing the ability of an application to run several operations at the same time. Multithread merely underlines the existence of multiple threads inside the application, whereas Parallelism focuses on the management of these threads. Of course, the more complex is the application, the more complex is the parallelism. In some applications, worker threads are independent. In this case, they are easy to manage, because they can run in parallel without restrictions. The only thing they have to do is to give their result to the main thread. The mechanism to do so is called synchronization. Let’s continue with the browser example. You google an expression and open the answers in several tabs. Tabs are handled by worker threads. Each worker thread executes a request upon a server, as soon as it has got the answer to its 1

request, it signals to the main thread that it has something to display and it ceases to work. The main thread changes the title of the corresponding tab and displays the content of the tab. There are several synchronization mechanisms. In graphical user interface such as in the browser example, the mechanism is based upon messages. But the general mechanism to synchronize thread is based upon a semaphore. The value of a semaphore is used to tell the situation to all the threads. 1 means one thread has taken the semaphore, the other threads must wait. 0 means the semaphore is free, any thread can take it. The semaphore is built in such a way that if two or more threads try to take it, only one will succeed. The others will have to wait until the value of the semaphore come back to 0. During their execution, threads encounter semaphores acting like border controls to restrict access of some sections of their execution paths. Sections protected by semaphore are also named critical sections.

Lock A common or shared data is accessed by all the threads of an application. To keep the data consistent, all accesses to it are critical sections protected by a semaphore. An example of that would be a counter which would be common to all the threads. To increment the counter, a worker thread takes the semaphore protecting the counter, increments the counter, and then releases the semaphore. The semaphore serializes the increment of the counter. It’s called a lock because it limits to one thread at a time, the accesses to the common data. Locks are easy to use, however they slow down the application. Firstly, because while waiting to take a semaphore, the threads are not running in parallel any more. And secondly, because the semaphore itself takes a long time to be modified. If the critical sections are large or frequent, the locks would ruin the performance due to the multithread and the same application would be faster in monothread. As a conclusion, one can say that locks are safe but counterproductive.

2

Concurrent Thread In this configuration, the critical sections can be modified to support multithread executions. The locks are no more needed and the application becomes lock-free. Threads running through a critical section at the same time are called concurrent threads. Their execution paths are not interrupted. Lock-free applications are efficient but tough to program. There is one domain that should get the best out of the concurrent programming. This is in datamanagement and most specifically what some technologist calls «Fast data» with its key-value store products, NoSQL - Wikipedia, the free encyclopedia. Whereas some key-value store, including Memcached or Redis, became a standard for many websites requiring high level of performance. A new generation of concurrent data-management systems will probably take the lead in order to fit with the new energy efficient, yet powerful, manycores architectures. One of them could be Index64.

Index64 Index64, www.index64.com, is a key-value database. Like in any K-V software, storing a key-value pair is just a call to the insert function, whereas retrieving value is a call to the select function. This straightforward approach gives both flexibility and performance. Unlike its current competitors, Index64 is concurrent ready. It supports multiple threads that work within the same data structure. In other words, the lock-free algorithm of Index64 supports any number of threads calling at the same time any of the insert/delete/select functions in any combination. For applications dealing with intensive data manipulations, thus needing high level of performance and availability, the concurrent feature of Index64 is already making a great difference, but it may even change the rules with the new coming many-core processors.

http://index64.free.fr/20141009_Index64.htm

Jean-Christian Llobet ▪ IT Performance Architect ▪ Oct. 2014

3