Sponsored By

Lock Free, or Die; the only sane way to do parallelism

This is an independent study paper I wrote for one of my Professors.

Slade Villena, Blogger

May 31, 2010

7 Min Read

Lock Free, or Die

The only sane way to do parallelism





This document presents an argument for lock free mechanisms in threaded programming.


The Curse of Blocking Mechanisms

Deadlocks and race conditions are the most critical issues in threaded programming. Both problem states present dangerously elegant bugs. Their characteristics have multiple dimensions of error; timing, 'show stopping' crashes and erratic instances.


The timing of deadlocks and race conditions are hard to detect, simply because they must occur at run-time. Most compilers and IDEs do not offer functions to detect actual deadlocks, nor can they easily spot race conditions by simply parsing the source. A program can run with code that can potentially deadlock, but never actually come to a deadlock. As the source base grows, and the thread changes functionality, the potential deadlock can either accumulate velocity leading to a crash, or the potential deadlock can continue to be stealthy, and avoid detection at run time. Race conditions share the same paradigms for timing, but do not necessarily lead to crashes. The timing of these bugs must be accurately detected at run-time, and this causes extreme amounts of effort on the programmers job.


Once a crash happens, multiple systems are affected, which leads to errors with seemingly unrelated causes. Data corruption between 2 threads can easily lead to a crash with multiple levels of errors. Sometimes the nature of these errors have nothing to do with each other (ie, a graphics rendering section of code causes a crash in user input). This can lead to false assumptions about the crash, and the true nature of the codes instability.


Combining the above qualities, a derived state of erratic error conditions are produced. A deadlock will occur on different intervals of run-time, and these intervals may or may not be discrete (ie, the deadlock can occur between 5 minutes or 4 hours into the running program). Affecting multiple systems of data, compounded by the erratic timing, provides a deep risk that cannot be easily mitigated. Often, these qualities of crashes is enough to kill a simple risk assessment; given the above, any sane programmer has a solid reason to avoid threaded programming altogether.



The Culprit

The root of deadlocks and race conditions occur at the very core of parallel programming, the problem of mutual exclusion. In the current paradigm of POSIX and Unix style processes, programmers use blocking mechanisms to change the state of a thread. Usually, there is a 'BLOCKED' state for any running thread, depending on the implementation provided by the operating system. On the simplest level, we can use the simplest mutex data type to illustrate this problem. In the blocking paradigm, threads that attempt to access critical sections of data must first gain access its assigned mutex. If the mutex is claimed by another thread, the calling thread either changes its state to BLOCKED, and waits until the mutex is freed.


Deadlocks occur when 2 or more threads are waiting for certain mutexs to be freed in order to continue operation. If all these threads are in the blocked state, releasing a mutex that another thread is waiting for creates the deadlock itself.


In the blocking paradigm, it is necessary to block a thread to follow the concept of mutual exclusion. This creates a heavy conundrum; one of the primary catalysts for the problem states in concurrent programming is caused by a core function for concurrent programming.


The Solution

Solving the problem of the blocking paradigm requires foregoing the blocking paradigm itself; never use blocking operations.


This solution attacks deadlocks directly; if there are no blocking operations in place, than all threads running in any given concurrent program environment will always run and never deadlock. Never. However, this solution does not completely avoid race conditions. This solution also doesn't address 'stale data'.


Lock free architectures and APIs, which are provided by various operating systems like Cell OS LV2, provide an avenue for this solution. The most common data structure in these API's is the Lock Free Queue (LFQ). The LFQ maintains atomic operations in its data, and provides thread-safe access to multiple processes. Inserting into the LFQ is atomic, and does not require threads to block (this statement is not completely accurate; in CellOS, the LFQ implementation allows 16 different threads to access the LFQ atomically, and blocks all other threads that go above the quota.) The same goes for de-queue operations from the LFQ. The atomic nature of the LFQ requires internal operations to be thread-safe on the instruction level, and must be maintained during context switches between threads.


The second part of the solution requires shifting the design mindset of the programmer; since instructions and processes never block, it is the responsibility of the programmer to ensure the state of data never goes stale. Stale data is caused by processes performing strict read-only operations on specific pieces of memory that are volatile in nature, and are also manipulated by various threads. As an example, given a system with multiple processes with one process reading a volatile data section, and all other threads performing lock-free write operations on that same piece of data, the thread performing the read operation may or may not get the current state of the data. Since the read is atomic, other threads may be writing to the same piece of data after the read operation itself. There can be a huge backlog of 'new data states' generated by other threads, which are not reflected in the read operation.


In order to solve, or at least mitigate, the problem of state data, a programmer must utilize a message-passing architecture through LFQ's. This can be implemented with the following rule base;


1.  READ operations on volatile data must be performed with no blocking operations. 2.All WRITE operations manipulating volatile sections must manipulate data with message-passing through the LFQ; a separate function must perform the changes to the data and other critical sections. The contents of the LFQ will be individual messages changing the state of a specified section.

3.One, and only one thread will be performing WRITE operations, and the specific data must come from the LFQ. This thread will perform a de-queue, parse the message, and write the new data state, every time it switches into context.

4.If the LFQ is at a certain threshold, or the LFQ reaches a certain size, all other threads that are running in the program must perform a “No Operation at this time”, without blocking. This allows all LFQ's to clear out its WRITE operations, and allows the state of the program to “catch up”.


With this simple rule base, all changes to the state of any program can guarantee that the data never goes stale. This also follows the message-passing paradigm, and maintains stability through lock free mechanisms.




  Much of the risk in concurrent programming requires a massive shift in the way we think about the state of a running program. We can no longer assume that we are safe in the blocking paradigm, since its core function requires us to bear the burdens and risks of deadlocks and race conditions. By attacking the root cause of deadlocks, we can get rid of deadlocks altogether.

Read more about:

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like