Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding Atomicity and Data Races in C++ Concurrency - Prof. John M. Mellor-Crummey, Study notes of Computer Science

Rice University Computer Science

Prof. John M. Mellor-Crummey

This document, from a lecture given at rice university in 2008, discusses the importance of a c++ concurrency memory model and the issues that arise when dealing with data races and atomicity in multithreaded c++ programming. It covers the motivation for a memory model, the problems with thread libraries, and the implications for current processors.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-gj2 🇺🇸

10 documents

1 / 29

This page cannot be seen from the preview

Don't miss anything!

John Mellor-Crummey

Department of Computer Science

Rice University

[email protected]

C++ Concurrency Memory Model

COMP 522 Lecture 12 9 October 2008

Discover Study notes of Computer Science Rice University

Partial preview of the text

Download Understanding Atomicity and Data Races in C++ Concurrency - Prof. John M. Mellor-Crummey and more Study notes Computer Science in PDF only on Docsity!

John Mellor-Crummey

Department of Computer Science Rice University [email protected]

C++ Concurrency Memory Model

COMP 522 Lecture 12 9 October 2008

Why a C++ Concurrency Memory Model?

Technology trends —chips relying on increasing core counts to boost performance —substantial performance gains available only for parallel codes
Current practice —most parallel programs: threads sharing an address space
C and C++ —defined as single threaded languages; no reference to threads —compilers are thread unaware - generate code as for a single-threaded application compilers reorder assignments to independent vars does not preserve meaning of multi-threaded programs

Problems with Thread Libraries

Informal multi-threaded semantics is insufficient

Definition of a data race is under specified —is this program race free? —is outcome r1=r2=1 allowed? —how could it occur?
Hard to reason when a compiler might introduce races —a compiler might do something like this with bit fields —why is this not OK?

Avoiding Data Races

Atomic primitives that are safe for concurrent use
e.g. gcc^ _sync^ intrinsics (selected examples) —http://gcc.gnu.org/onlinedocs/gcc-4.3.2/gcc/Atomic-Builtins.html

Review of Memory Model Motivation

Memory model specifies the values that a read of a shared

variable may return

Affects —programmability —performance —portability
Compiler and hardware must respect a memory model —constrains compiler transformations —compiler and h/w must cooperate to preserve intended semantics

Review of Memory Model Flavors

Sequential consistency —as if single total order (an interleaving) of memory operations —operations within each thread appear in program order —intuitive, but limits optimizations and performance
Relaxed memory models —allow various h/w optimizations —difficult to reason with for high-level languages —but still limit certain compiler optimizations (see Java MM)

A Concurrency Memory Model for C++

Define semantics of multi-threaded C++ for C++0x

Main issues

(1) Sequentially consistent atomics

(2) Trylock and impact on the definition of data race

(3) Semantics for data races

(4) Low-level atomics with weaker memory ordering

Preliminaries

Data operations —load, store
Synchronization operations —atomic load, atomic store, atomic read-modify-write, lock, unlock
Sequenced-before^ relation on memory operations —unlike prior work, this relation is only a partial order - permits undefined argument evaluation order for functions
Type 1 data race —two concurrent conflicting accesses from different threads —at least one is a data operation
Type 2 data race —two data accesses to a location unordered by happens-before

C++ Concurrency Memory Model

If program (on a given input) has a sequentially consistent

execution with a type 1 data race, its behavior is undefined

Otherwise, the program behaves according to one of its

sequentially consistent executions

—Q: why can a program have more than one sequentially consistent execution?

Optimizations Allowed

Given memory op M1 sequenced-before M
H/W may reorder M1 and M2 if semantics of thread allow and —M1 is data operation and M2 is read synchronization operation - may delay load/store until after atomic-load —M1 is a write synchronization and M2 is data operation - may promote load/store before atomic-store —M1 and M2 are both data with no synch sequenced between them - reorder load/store w.r.t. other load/store if no synch between them
When locks used in well structured ways, can reorder (as long

as intra-thread semantics permit)

—M1 is data operation and M2 is write to a lock —M1 is unlock and M2 is read or write to a different lock

Data writes & well-structured lock/unlock can be non-atomic

(1) Sequentially Consistent Atomics

Some had concerns that requirement “atomics appear

sequentially consistent” would be too costly

—except on Itanium, existing ISAs don’t distinguish synch ops —compilers typically enforce orderings with fence instructions

types of fences membar: all memops before membar finish before any after store|load fence some processors ensure orderings implicitly: AMD64 and Intel provide load|load and load|store fence after each load provide store|store fence semantics after each store
Many fences ambiguous about h/w write atomicity —whether a thread’s write could be visible to one thread before all
Difficult to formalize semantics that were —meaningfully weaker than SC for h/w, and —simple enough to program

Cost of Enforcing Write Atomicity

Fences ensure reads execute in program order
Writes must be atomic to ensure sequential consistency —consider the effect of non-atomic writes
Ownership cache protocols avoid this inconsistency —invalidate other copies before committing write —ensures that all readers see new value —provides serialization for reads and writes to the same location

inferred

ordering

Consequences of Relaxing Write Atomicity

Write-to-read causality must be upheld —write of Y by T2 followed by read of value written by T establishes a causal connection - how could write atomicity violation occur? T1 and T2 share L1 write-through cache T2 is allowed to read T1’s update early T3 receives invalidation for Y before that of X —sequential consistency requires T3 to return new value for X
One possible approach to fix this —ensure writes separated by a fence from a given node are seen in same order by all nodes —but, insufficient for other programs (^19)

Consequences of Relaxing Write Atomicity

Read-to-write causality —read of a old value by a T2 followed by T3’s write of new value establishes causal connection - how could write atomicity violation occur? T1 and T2 share L1 write-through cache T2 is allowed to read T1’s update to X early T3 reads old value for X before receiving invalidation
Solution suggested for WRC is insufficient —each node T1/T2 or T3 performs only one write
This atomicity violation may be acceptable to programmers

Understanding Atomicity and Data Races in C++ Concurrency - Prof. John M. Mellor-Crummey, Study notes of Computer Science

Related documents

Partial preview of the text

Download Understanding Atomicity and Data Races in C++ Concurrency - Prof. John M. Mellor-Crummey and more Study notes Computer Science in PDF only on Docsity!

John Mellor-Crummey

C++ Concurrency Memory Model

Informal multi-threaded semantics is insufficient

variable may return

Define semantics of multi-threaded C++ for C++0x

Main issues

(1) Sequentially consistent atomics

(2) Trylock and impact on the definition of data race

(3) Semantics for data races

(4) Low-level atomics with weaker memory ordering

execution with a type 1 data race, its behavior is undefined

sequentially consistent executions

as intra-thread semantics permit)

sequentially consistent” would be too costly

inferred

ordering