concurrent data structure

In computer science, a concurrent data structure (also called shared data structure) is a data structure designed for access and modification by multiple computing threads (or processes or nodes) on a computer, for example concurrent queues, concurrent stacks etc. The concurrent data structure is typically considered to reside in an abstract storage environment known as shared memory, which may be physically implemented as either a tightly coupled or a distributed collection of storage modules. {{Cite book |title=A VLSI Architecture for Concurrent Data Structures |isbn=9781461319955 |last1=Dally |first1=J. W. |date=6 December 2012 |publisher=Springer }}{{Cite book |title=23nd International Symposium on Distributed Computing, DISC |publisher=Springer Science & Business Media |year=2009}}

Basic principles

Concurrent data structures, intended for use in

parallel or distributed computing environments, differ from

"sequential" data structures, intended for use on a uni-processor

machine, in several ways.

{{cite book

|author=Mark Moir |author2=Nir Shavit

| title = Handbook of Data Structures and Applications

| chapter = Concurrent Data Structures

|chapter-url=http://www.cs.tau.ac.il/~shanir/concurrent-data-structures.pdf

|archive-url=https://web.archive.org/web/20110401070433/http://www.cs.tau.ac.il/~shanir/concurrent-data-structures.pdf

|archive-date=2011-04-01

|editor=Dinesh Metha |editor2=Sartaj Sahni

| publisher = Chapman and Hall/CRC Press

| year = 2007

| pages = 47-14–47-30

}}

Most notably, in a sequential environment

one specifies the data structure's properties and checks that they

are implemented correctly, by providing safety properties. In

a concurrent environment, the specification must also describe

liveness properties which an implementation must provide.

Safety properties usually state that something bad never happens,

while liveness properties state that something good keeps happening.

These properties can be expressed, for example, using Linear Temporal Logic.

The type of liveness requirements tend to define the data structure.

The method calls can be blocking or non-blocking. Data structures are not

restricted to one type or the other, and can allow combinations

where some method calls are blocking and others are non-blocking

(examples can be found in the Java concurrency software

library).

The safety properties of concurrent data structures must capture their

behavior given the many possible interleavings of methods

called by different threads. It is quite

intuitive to specify how abstract data structures

behave in a sequential setting in which there are no interleavings.

Therefore, many mainstream approaches for arguing the safety properties of a

concurrent data structure (such as serializability, linearizability, sequential consistency, and

quiescent consistency) specify the structures properties

sequentially, and map its concurrent executions to

a collection of sequential ones.

To guarantee the safety and liveness properties, concurrent

data structures must typically (though not always) allow threads to

reach consensus as to the results

of their simultaneous data access and modification requests. To

support such agreement, concurrent data structures are implemented

using special primitive synchronization operations (see synchronization primitives)

available on modern multiprocessor machines

that allow multiple threads to reach consensus. This consensus can be achieved in a blocking manner by using locks, or without locks, in which case it is non-blocking. There is a wide body

of theory on the design of concurrent data structures (see

bibliographical references).

Design and implementation

Concurrent data structures are significantly more difficult to design

and to verify as being correct than their sequential counterparts.

The primary source of this additional difficulty is concurrency, exacerbated by the fact that

threads must be thought of as being completely asynchronous:

they are subject to operating system preemption, page faults,

interrupts, and so on.

On today's machines, the layout of processors and

memory, the layout of data in memory, the communication load on the

various elements of the multiprocessor architecture all influence performance.

Furthermore, there is a tension between correctness and performance: algorithmic enhancements that seek to improve performance often make it more difficult to design and verify a correct

data structure implementation.

{{cite conference

| title=More than you ever wanted to know about synchronization: Synchrobench, measuring the impact of the synchronization on concurrent algorithms

| author=Gramoli, V.

| book-title=Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

| pages=1–10

| year=2015

| publisher=ACM

| url=http://sydney.edu.au/engineering/it/~gramoli/doc/pubs/gramoli-synchrobench.pdf

| archive-url=https://web.archive.org/web/20150410030004/http://sydney.edu.au/engineering/it/~gramoli/doc/pubs/gramoli-synchrobench.pdf

| archive-date=10 April 2015

}}

A key measure for performance is scalability, captured by the speedup of the implementation. Speedup is a measure of how

effectively the application is using the machine it is running

on. On a machine with P processors, the speedup is the ratio of the structures execution time on a single processor to its execution time on P processors. Ideally, we want linear speedup: we would like to achieve a

speedup of P when using P processors. Data structures whose

speedup grows with P are called scalable. The extent to which one can scale the performance of a concurrent data structure is captured by a formula known as Amdahl's law and

more refined versions of it such as Gustafson's law.

A key issue with the performance of concurrent data structures is the level of memory contention: the overhead in traffic to and from memory as a

result of multiple threads concurrently attempting to access the same

locations in memory. This issue is most acute with blocking implementations

in which locks control access to memory. In order to

acquire a lock, a thread must repeatedly attempt to modify that

location. On a cache-coherent

multiprocessor (one in which processors have

local caches that are updated by hardware to keep them

consistent with the latest values stored) this results in long

waiting times for each attempt to modify the location, and is

exacerbated by the additional memory traffic associated with

unsuccessful attempts to acquire the lock.

.NET

.NET have {{Mono|ConcurrentDictionary}},{{cite web |title=ConcurrentDictionary Class (System.Collections.Concurrent) |url=https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2?view=net-9.0 |website=learn.microsoft.com |access-date=26 November 2024 |language=en-us}} {{Mono|ConcurrentQueue}}{{cite web |title=ConcurrentQueue Class (System.Collections.Concurrent) |url=https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentqueue-1?view=net-9.0 |website=learn.microsoft.com |access-date=26 November 2024 |language=en-us}} and {{Mono|ConcurrentStack}}{{cite web |title=ConcurrentStack Class (System.Collections.Concurrent) |url=https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentstack-1?view=net-9.0 |website=learn.microsoft.com |access-date=26 November 2024 |language=en-us}} in the {{Mono|System.Collections.Concurrent}} namespace.

UML class diagram of System.Collections.Concurrent in .NET

Rust

Rust instead wraps data structures in {{Mono|Arc}} and {{Mono|Mutex}}.{{cite web |title=Shared-State Concurrency - The Rust Programming Language |url=https://doc.rust-lang.org/book/ch16-03-shared-state.html |website=doc.rust-lang.org |access-date=26 November 2024}}

let counter = Arc::new(Mutex::new(0));

References

External links

[https://web.archive.org/web/20160303215946/http://www.ibm.com/developerworks/aix/library/au-multithreaded_structures1/index.html Multithreaded data structures for parallel computing, Part 1] (Designing concurrent data structures) by Arpan Sen
[https://web.archive.org/web/20160304000118/http://www.ibm.com/developerworks/aix/library/au-multithreaded_structures2/index.html Multithreaded data structures for parallel computing: Part 2] (Designing concurrent data structures without mutexes) by Arpan Sen
[http://libcds.sourceforge.net/ libcds] – C++ library of lock-free containers and safe memory reclamation schema
[https://sites.google.com/site/synchrobench/ Synchrobench] – C/C++ and Java libraries and benchmarks of lock-free, lock-based, TM-based and RCU/COW-based data structures.

Category:Distributed data structures