Multiprocessors and Thread-Level Parallelism
1. Explain Symmetric Shared-Memory Architecture and How to reduce Cache
coherence problem in Symmetric Shared Memory architectures:
Symmetric Shared Memory Architectures:
The Symmetric Shared Memory Architecture consists of several processors with a single
physical memory shared by all processors through a shared bus which is shown below.
` Small-scale shared-memory machines usually support the caching of both shared
and private data. Private data is used by a single processor, while shared data is used by
multiple processors, essentially providing communication among the processors through
reads and writes of the shared data. When a private item is cached, its location is
migrated to the cache, reducing the average access time as well as the memory bandwidth
required. Since no other processor uses the data, the program behavior is identical to that
in a uniprocessor.
Cache Coherence in Multiprocessors:
Introduction of caches caused a coherence problem for I/O operations, The same problem
exists in the case of multiprocessors, because the view of memory held by two different
processors is through their individual caches. Figure 6.7 illustrates the problem and
shows how two different processors can have two different values for the same location.
This difficulty s generally referred to as the cache-coherence problem.
1 CPU A reads X 1 1
CPU B reads
3 CPU A stores 0 into X 0 1 0
FIGURE 6.7 The cache-coherence problem for a single memory location (X), read
and written by two processors (A and B). We initially assume that neither cache
contains the variable and that X has the value 1. We also assume a write-through cache; a
write-back cache adds some additional but similar complications. After the value of X
has been written by A, A’s cache and the memory both contain the new value, but B’s
cache does not, and if B reads the value of X, it will receive 1!
Informally, we could say that a memory system is coherent if any read of a data item
returns the most recently written value of that data item. This simple definition contains
two different aspects of memory system behavior, both of which are critical to writing
correct shared-memory programs. The first aspect, called coherence, defines what values
can be returned by a read. The second aspect, called consistency, determines when a
written value will be returned by a read. Let’s look at coherence first.
A memory system is coherent if
1 A read by a processor, P, to a location X that follows a write by P to X, with no
writes of X by another processor occurring between the write and the read by P, always
returns the value written by P.
2 A read by a processor to location X that follows a write by another processor to X
returns the written value if the read and write are sufficiently separated in time and no
other writes to X occur between the two accesses.
3 Writes to the same location are serialized: that is, two writes to the same location
by any two processors are seen in the same order by all processors. For example, if the
values 1 and then 2 are written to a location, processors can never read the value of the
location as 2 and then later read it as 1.
Coherence and consistency are complementary: Coherence defines the behavior of reads
and writes to the same memory location, while consistency defines the behavior of reads
and writes with respect to accesses to other memory locations.
Basic Schemes for Enforcing Coherence
Coherent caches provide migration, since a data item can be moved to a local cache
and used there in a transparent fashion. This migration reduces both the latency to access
a shared data item that is allocated remotely and the bandwidth demand on the shared
Coherent caches also provide replication for shared data that is being simultaneously
read, since the caches make a copy of the data item in the local cache. Replication
reduces both latency of access and contention for a read shared data item.
The protocols to maintain coherence for multiple processors are called cache-
coherence protocols. There are two classes of protocols, which use different techniques
to track the sharing status, in use:
Directory based—The sharing status of a block of physical memory is kept in just one
location, called the directory; we focus on this approach in section 6.5, when we discuss
scalable shared-memory architecture.
Snooping—Every cache that has a copy of the data from a block of physical memory also
has a copy of the sharing status of the block, and no centralized state is kept. The caches
are usually on a shared-memory bus, and all cache controllers monitor or snoop on the
bus to determine whether or not they have a copy of a block that is requested on the bus.
The method which ensure that a processor has exclusive access to a data item before it
writes that item. This style of protocol is called a write invalidate protocol because it
invalidates other copies on a write. It is by far the most common protocol, both for
snooping and for directory schemes. Exclusive access ensures that no other readable or
writable copies of an item exist when the write occurs: all other cached copies of the item
Figure 6.8 shows an example of an invalidation protocol for a snooping bus with write-
back caches in action To see how this protocol ensures coherence, consider a write
followed by a read by another processor: Since the write requires exclusive access, any
copy held by the reading processor must be invalidated (hence the protocol name). Thus,
when the read occurs, it misses in the cache and is forced to fetch a new copy of the data.
For a write, we require that the writing processor have exclusive access, preventing any
other processor from being able to write simultaneously. If two processors do attempt to
write the same data simultaneously, one of them wins the race, causing the other
processor’s copy to be invalidated. For the other processor to complete its write, it must
obtain a new copy of the data, which must now contain the updated value. Therefore, this
protocol enforces write serialization.
Contents of Contents of
CPU A reads X Cache miss for X 0 0
CPU B reads X Cache miss for X 0 0 0
CPU A writes a 1
to X Invalidation for X 1 0
CPU B reads X
Cache miss for X
FIGURE 6.8 An example of an invalidation protocol working on a snooping bus for
a single cache block (X) with write-back caches.