Cache Coherence, Interconnection Network Design, and Routing Algorithms, Slides of Parallel Computing and Programming

Cache coherence schemes, interconnection network design, and routing algorithms. It covers cache-based (pointer-based) schemes, hierarchical directories, caching in the internet, network caching, plaxton’s scheme, and various properties of routing algorithms. It also includes a comparison of packet switching and wormhole routing, and a discussion on deadlock freedom in routing algorithms.

Typology: Slides

2012/2013

Uploaded on 04/30/2013

devank
devank 🇮🇳

4.3

(12)

152 documents

1 / 36

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Cache Coherence and
Interconnection Network Design
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24

Partial preview of the text

Download Cache Coherence, Interconnection Network Design, and Routing Algorithms and more Slides Parallel Computing and Programming in PDF only on Docsity!

Cache Coherence and

Interconnection Network Design

Cache-based (Pointer-based) Schemes

  • How they work:
    • home only holds pointer to rest of directory info
    • distributed linked list of copies, weaves through caches
      • cache tag has pointer, points to next cache with a copy
    • on read, add yourself to head of the list (comm. needed)
    • on write, propagate chain of invalidations down the list
  • What if a link fails? => Scalable Coherent Interface (SCI) IEEE Standard - doubly linked list - What to do on replacement?

P Cache

P Cache

P Cache

Main Memory(Home) Node 0 Node 1 Node 2

Hierarchical Directories

  • Directory is a hierarchical data structure
    • leaves are processing nodes, internal nodes just directory
    • logical hierarchy, not necessarily physical • (can be embedded in general network)

processing nodes

level-1 directory

level-2 directory

(Tracks which of its childrenprocessing nodes have a copy of the memory block.which local memory blocks areAlso tracks cached outside this subtree.Inclusion is maintained between (Tracks which of its children processor caches and directory.) level-1 directories have a copyof the memory block.Also tracks which local memory blocks arecached outside this subtree. Inclusion is maintained betweenlevel-1 directories and level-2 directory.)

Caching in the Internet (Server side caching, Client side caching, Network caching)

Comparison with P2P

  • Directory is similar to tracker in Bit Torrent (BT) or root in P2P network – Plaxton’s root has one pointer and BT has all pointers
  • Real object may be somewhere else (call it as home node), as pointed by directory
  • The shared caches are extra locations of the object equivalent to peers
  • Duplicate directories possible – see hierarchical directory design (MIND) later

Lot of possible designs for directory – Can we do similar design in P2P?

Plaxton’s Scheme

  • Similar to Distributed Shared Memory (DSM) Cache Protocol
  • root points to the nearest home peer, where a shared copy can be found.Root is the Home node in DSM, where directory is kept but not the real object. The
  • nearest object nodes (equivalent to shared copy). However, in hierarchical directorySimilar to Hierarchical Directory scheme, where intermediate nodes point to scheme Intermediate pointers point to all shared copies.

Interconnection Network Design

Adapted from UC, Berkeley Notes

Scalable, High Perf. Interconnection

Network

  • At Core of Parallel Computer Arch.
  • Requirements and trade-offs at many levels - Elegant mathematical structure - Deep relationships to algorithm structure - Managing many traffic flows - Electrical / Optical link properties
  • Little consensus
    • interactions across levels
    • Performance metrics?

M P

CA M P

CA

networkinterface

ScalableInterconnection Network

Formalism

  • network is a graph V = {switches and nodes} connected by communication channels C ⊆ V × V
  • Channel has width w and signaling rate f = 1/τ
    • channel bandwidth b = wf
    • phit (physical unit) data transferred per cycle
    • flit - basic unit of flow-control
  • Number of input (output) channels is switch degree
  • Sequence of switches and links followedDocsity.com

What characterizes a network?

  • Topology (what)
    • physical interconnection structure of the network graph
    • direct: node connected to every switch
    • indirect: nodes connected to specific subset of switches
  • Routing Algorithm (which)
    • restricts the set of paths that msgs may follow
    • many algorithms with different properties
      • gridlock avoidance?

Typical Packet Format

  • Two basic mechanisms for abstraction
    • encapsulation
    • fragmentation

TrailerCodeError PayloadData HeaderControland^ Routing digital symbol

Sequence of symbols transmitted over a channel

Review: Performance Metrics

Sender

Receiver

Sender Overhead

Transmission time (size ÷ bandwidth)

Transmission time (size ÷ bandwidth)

Time of Flight

Receiver Overhead Transport Latency

Total Latency = Sender Overhead + Time of Flight + Message Size ÷ BW + Receiver Overhead

Total Latency

(processor busy)

(processor busy)

Includes header/trailer in BW calculation?

Store and Forward vs. Cut-Through

  • Advantage
    • Latency reduces from function of:

number of intermediate switches X by the size of the packet

to

time for 1st part of the packet to negotiate the switches

  • the packet size ÷ interconnect BW