Understanding Atomicity and Data Races in C++ Concurrency - Prof. John M. Mellor-Crummey, Study notes of Computer Science

This document, from a lecture given at rice university in 2008, discusses the importance of a c++ concurrency memory model and the issues that arise when dealing with data races and atomicity in multithreaded c++ programming. It covers the motivation for a memory model, the problems with thread libraries, and the implications for current processors.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-gj2
koofers-user-gj2 🇺🇸

10 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
John Mellor-Crummey
Department of Computer Science
Rice University
C++ Concurrency Memory Model
COMP 522 Lecture 12 9 October 2008
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Understanding Atomicity and Data Races in C++ Concurrency - Prof. John M. Mellor-Crummey and more Study notes Computer Science in PDF only on Docsity!

John Mellor-Crummey

Department of Computer Science Rice University [email protected]

C++ Concurrency Memory Model

COMP 522 Lecture 12 9 October 2008

Why a C++ Concurrency Memory Model?

  • Technology trends —chips relying on increasing core counts to boost performance —substantial performance gains available only for parallel codes
  • Current practice —most parallel programs: threads sharing an address space
  • C and C++ —defined as single threaded languages; no reference to threads —compilers are thread unaware - generate code as for a single-threaded application compilers reorder assignments to independent vars does not preserve meaning of multi-threaded programs

Problems with Thread Libraries

Informal multi-threaded semantics is insufficient

  • Definition of a data race is under specified —is this program race free? —is outcome r1=r2=1 allowed? —how could it occur?
  • Hard to reason when a compiler might introduce races —a compiler might do something like this with bit fields —why is this not OK?

Avoiding Data Races

  • Atomic primitives that are safe for concurrent use
  • e.g. gcc^ _sync^ intrinsics (selected examples) —http://gcc.gnu.org/onlinedocs/gcc-4.3.2/gcc/Atomic-Builtins.html

Review of Memory Model Motivation

  • Memory model specifies the values that a read of a shared

variable may return

  • Affects —programmability —performance —portability
  • Compiler and hardware must respect a memory model —constrains compiler transformations —compiler and h/w must cooperate to preserve intended semantics

Review of Memory Model Flavors

  • Sequential consistency —as if single total order (an interleaving) of memory operations —operations within each thread appear in program order —intuitive, but limits optimizations and performance
  • Relaxed memory models —allow various h/w optimizations —difficult to reason with for high-level languages —but still limit certain compiler optimizations (see Java MM)

A Concurrency Memory Model for C++

Define semantics of multi-threaded C++ for C++0x

Main issues

(1) Sequentially consistent atomics

(2) Trylock and impact on the definition of data race

(3) Semantics for data races

(4) Low-level atomics with weaker memory ordering

Preliminaries

  • Data operations —load, store
  • Synchronization operations —atomic load, atomic store, atomic read-modify-write, lock, unlock
  • Sequenced-before^ relation on memory operations —unlike prior work, this relation is only a partial order - permits undefined argument evaluation order for functions
  • Type 1 data race —two concurrent conflicting accesses from different threads —at least one is a data operation
  • Type 2 data race —two data accesses to a location unordered by happens-before

C++ Concurrency Memory Model

  • If program (on a given input) has a sequentially consistent

execution with a type 1 data race, its behavior is undefined

  • Otherwise, the program behaves according to one of its

sequentially consistent executions

—Q: why can a program have more than one sequentially consistent execution?

Optimizations Allowed

  • Given memory op M1 sequenced-before M
  • H/W may reorder M1 and M2 if semantics of thread allow and —M1 is data operation and M2 is read synchronization operation - may delay load/store until after atomic-load —M1 is a write synchronization and M2 is data operation - may promote load/store before atomic-store —M1 and M2 are both data with no synch sequenced between them - reorder load/store w.r.t. other load/store if no synch between them
  • When locks used in well structured ways, can reorder (as long

as intra-thread semantics permit)

—M1 is data operation and M2 is write to a lock —M1 is unlock and M2 is read or write to a different lock

  • Data writes & well-structured lock/unlock can be non-atomic

(1) Sequentially Consistent Atomics

  • Some had concerns that requirement “atomics appear

sequentially consistent” would be too costly

—except on Itanium, existing ISAs don’t distinguish synch ops —compilers typically enforce orderings with fence instructions

  • types of fences membar: all memops before membar finish before any after store|load fence some processors ensure orderings implicitly: AMD64 and Intel provide load|load and load|store fence after each load provide store|store fence semantics after each store
  • Many fences ambiguous about h/w write atomicity —whether a thread’s write could be visible to one thread before all
  • Difficult to formalize semantics that were —meaningfully weaker than SC for h/w, and —simple enough to program

Cost of Enforcing Write Atomicity

  • Fences ensure reads execute in program order
  • Writes must be atomic to ensure sequential consistency —consider the effect of non-atomic writes
  • Ownership cache protocols avoid this inconsistency —invalidate other copies before committing write —ensures that all readers see new value —provides serialization for reads and writes to the same location

inferred

ordering

Consequences of Relaxing Write Atomicity

  • Write-to-read causality must be upheld —write of Y by T2 followed by read of value written by T establishes a causal connection - how could write atomicity violation occur? T1 and T2 share L1 write-through cache T2 is allowed to read T1’s update early T3 receives invalidation for Y before that of X —sequential consistency requires T3 to return new value for X
  • One possible approach to fix this —ensure writes separated by a fence from a given node are seen in same order by all nodes —but, insufficient for other programs (^19)

Consequences of Relaxing Write Atomicity

  • Read-to-write causality —read of a old value by a T2 followed by T3’s write of new value establishes causal connection - how could write atomicity violation occur? T1 and T2 share L1 write-through cache T2 is allowed to read T1’s update to X early T3 reads old value for X before receiving invalidation
  • Solution suggested for WRC is insufficient —each node T1/T2 or T3 performs only one write
  • This atomicity violation may be acceptable to programmers