Interconnection Networks - Parallel Computer Architecture - Lecture Slides, Slides of Computer Science

These are the Lecture Slides of Parallel Computer Architecture which includes Conflict Resolution, Cache Miss, Write Serialization, In-Order Response, Multi-Level Caches, Dependence Graph etc.Key important points are: Interconnection Networks, Fundamentals, Introduction to Routers, Latency and Bandwidth, Router Architecture, Coherence Protocol and Routing, Fundamentals

Typology: Slides

2012/2013

Uploaded on 03/28/2013

ekana
ekana 🇮🇳

4

(44)

370 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Objectives_template
file:///E|/parallel_com_arch/lecture37/37_1.htm[6/13/2012 12:18:39 PM]
Module 17: "Interconnection Networks"
Lecture 37: "Introduction to Routers"
Interconnection Networks
Fundamentals
Latency and bandwidth
Router architecture
Coherence protocol and routing
[From Chapter 10 of Culler, Singh, Gupta]
pf3
pf4
pf5

Partial preview of the text

Download Interconnection Networks - Parallel Computer Architecture - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

Module 17: "Interconnection Networks"

Lecture 37: "Introduction to Routers"

Interconnection Networks

Fundamentals

Latency and bandwidth

Router architecture

Coherence protocol and routing

[From Chapter 10 of Culler, Singh, Gupta]

Module 17: "Interconnection Networks"

Lecture 37: "Introduction to Routers"

Fundamentals

The switches or the routers directly talk to the NI The NI output and input queues normally map to the virtual channels of the connecting router Topology The structure of the interconnect network Direct network: each router is attached to a complete node (most popular) Indirect network: Nodes are attached to few routers only; other routers cannot generate packets, but can only forward them in right direction Routing algorithms Deterministic: fixed route between every pair of source and destination Adaptive: based on congestion different routes may be selected dynamically Switching strategy Circuit switching: the path from source to destination is first established and reserved before the message is transmitted (popular in phone world, but not in PCA) Packet switching: A message is divided into several packets and each packet carries routing information in its header; leads to better utilization of network resources since individual packets need to be routed only (as opposed to the entire message together) Flow control How to detect and avoid resource (buffer, channel, etc.) collision? Minimum unit of information that can be transferred over a link at a time is called flit (flow control unit): may be as small as a phit (physical unit) or as large as a message Metrics to compare topology Diameter: maximum shortest distance between any pair Average distance: distance between two arbitrary nodes averaged over all pairs Bisection bandwidth: aggregate bandwidth of minimum set of links which when removed leaves the network as two disjoint roughly equal collection of nodes Packet structure Header: contains routing and control information, e.g., source, destination, size of data payload, message opcode, etc.; an intermediate router only needs to inspect the header to handle a newly arrived packet Address: for CC-NUMA machines the cache line address Payload: transmitted data; for CC-NUMA machines this is normally a cache line, or

message stay blocked at several routers along the route like a worm) General contention-control What happens to incoming packets if router buffers are full? General solution in data communication or in WAN is to drop packets and retry based on time-out (TCP/IP, ATM, etc.) In parallel computers packets are normally not dropped; a link-level flow control blocks the packets in the last router’s output port: may cause tree saturation

Module 17: "Interconnection Networks"

Lecture 37: "Introduction to Routers"

Latency and bandwidth

Latency gets affected by delivered bandwidth and the delivered bandwidth may be lower than the actual bandwidth under contention, i.e. when bandwidth demand (called offered bandwidth) is much higher than affordable link bandwidth

Router architecture

Number of input ports is normally equal to the number of output ports which is the degree of the router In a direct network one input and one output port would connect to the host node’s NI outbound and NI inbound control respectively A single VLSI chip Pin count is essentially number of ports (input and output) multiplied by channel width High speed serial links offer lowest pin count, but the clock and control must be encoded within the serial bit stream Parallel links require high pin count and one extra channel is devoted to transmit the clock; also

Module 17: "Interconnection Networks"

Lecture 37: "Introduction to Routers"

Coherence protocol and routing

Have already discussed the necessity of at least two queues in each direction in NI; how do they talk to the router? Let’s call the queues as request and reply (in each direction): gets specified by the source coherence engine These queues form request and reply virtual networks in the system Each output queue of NI may map to several input virtual lanes in the router (at least one) Each port of the router has equal number of virtual lanes, e.g. the request virtual lanes form the request virtual network The coherence protocol normally does a static assignment of message types to virtual networks A message originating from request lane will be carried along the route in the request network and will arrive at the destination in the input request queue of NI Within each virtual network there may be several virtual channels per port of the router to avoid routing deadlock cycles, head-of-line blocking, and to aid adaptive routing Three-lane protocols normally have a third virtual network to carry requests generated by requests e.g., interventions and invalidations Stanford FLASH runs four-lane coherence protocols and uses all the four virtual lanes of SGI Spider router