Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

System Architecture - Computer Systems Architecture - Lecture Slides, Slides of Computer Architecture and Organization

Alagappa University Computer Architecture and Organization

Some concept of Computer Systems Architecture are Acyclic Graph, Advanced Micro Devices, Basic Grid Architecture, Control Flow Prediction, Desktop Processor Architecture, Message-Driven Processor. Main points of this lecture are: System Architecture, Hydrodynamics, Quantum Chemistry, Molecular Dynamics, Climate Modeling, Financial Modeling, Peak Performance, Low Latency, Bandwidth Networks, Point Operations

Typology: Slides

2012/2013

Uploaded on 04/27/2013

jutt 🇮🇳

4.5

(154)

75 documents

1 / 25

This page cannot be seen from the preview

Don't miss anything!

The IBM Blue Gene/L System

Architecture

Docsity.com

Discover Slides of Computer Architecture and Organization Alagappa University

Partial preview of the text

Download System Architecture - Computer Systems Architecture - Lecture Slides and more Slides Computer Architecture and Organization in PDF only on Docsity!

The IBM Blue Gene/L System

Architecture

What is Blue Gene/L?

Blue Gene is an IBM Research project dedicated to exploring the frontiers in supercomputing.
In November 2004, the IBM Blue Gene computer became the fastest supercomputer in the world.
This project is designed to scale to 65,536 dual-processor nodes, with a peak performance of 360 TeraFLOPS.
Example usage:
- hydrodynamics
- quantum chemistry
- molecular dynamics
- climate modeling
- financial modeling

Main Design Principles for Blue Gene/L

Some science & engineering applications scale up to and beyond 10,000 parallel processes.
Improve computing capability, holding total system cost.
Reduce cost/FLOP.
Reduce complexity and size.
- ~25KW/rack is max for air-cooling in standard room.
- Need to improve performance/power ratio.
- 700MHz PowerPC440 for ASIC has excellent FLOP/Watt.
Maximize Integration:
- On chip: ASIC with everything except main memory.
- Off chip: Maximize number of nodes in a rack..
Large systems require excellent reliability, availability, serviceability (RAS)

Main Design Principles (cont’d)

Make cost/performance trade-offs considering the end-use: - Applications <> Architecture <> Packaging - Examples: - 1 or 2 differential signals per torus link. - I.e. 1.4 or 2.8Gb/s. - Maximum of 3 or 4 neighbors on collective network. - I.e. Depth of network and thus global latency.
Maximize the overall system efficiency:
- Small team designed all of Blue Gene/L.
- Example: Chose ASIC die and chip pin-out to ease circuit card routing.

Blue Gene/L Architecture

Up to 323264=65536 nodes (3D torus).
Max 360 teraFLOPS computation power.
Each processor can perform 4 floating point

operations per cycle (in the form of two 64-bit floating point multiply-add’s per cycle)

5 networks connect nodes to themselves and

to the world.

Node Architecture

IBM PowerPC embedded CMOS processors, embedded DRAM, and system-on-a-chip technique is used.
11.1-mm square die size, allowing for a very high density of processing.
The ASIC uses IBM CMOS CU-11 0.13 micron technology.
700 Mhz processor speed close to memory speed.
Two processors per node.
Second processor is intended primarily for handling message passing operations

BlueGene/L node diagram. Docsity.com

Link ASIC

In addition to the compute ASIC, there is a “link” ASIC.
When crossing
- a midplane boundary
- BG/L’s torus
- global combining tree
- global interrupt signals pass through the BG/L link ASIC.
It redrives signals over the cables between BG/L midplanes.
The link ASIC can redirect signals between its different ports.
- enables BG/L to be partitioned into multiple, logically separate systems in which there is no traffic interference between systems.

The FP2 core (cont’d)

This enhanced set goes beyond the capabilities of

traditional SIMD architectures.

A single instruction can initiate a different but related

operation on different data.

Single Instruction Multiple Operation Multiple Data

(SIMOMD).

Either of the sides can access data from the other

side’s register file.

This saves a lot of swapping when working purely on

complex arithmetic operations.

Memory System

It is designed for high bandwidth, low latency

memory and cache accesses.

An L2 hit returns in 6 to 10 processor cycles
An L3 hit in about 25 cycles
An L3 miss in about 75 cycles
System has a 16 byte interface to nine 256Mb

SDRAM-DDR devices.

Operating at a speed of one half or one third

of the processor.

Torus Network (cont’d)

Class Routing Capability (Deadlock-free

Hardware Multicast)

Packets can be deposited along route to specified destination.
Allows for efficient one to many in some instances
Active messages allows for fast transposes as

required in FFTs.

Independent on-chip network interfaces enable

concurrent access.

Other Networks

A global combining/broadcast tree for

collective operations

A Gigabit Ethernet network for connection to

other systems, such as hosts and file systems.

A global barrier and interrupt network
And another Gigabit Ethernet to JTAG network

for machine control

Gb Ethernet Disk/Host I/O Network

IO nodes are leaves on collective network.
Compute and IO nodes use same ASIC, but:
- IO node has Ethernet not torus. Provedes IO seperation on application.
- Compute node has torus, not Ethernet: No need for 65536 cables.
Configurable ratio of IO to compute = 1:8,16,32,64,128.
Application runs on compute nodes, not IO nodes.

Fast Barrier/Interrupt Network

Four Independent Barrier or Interrupt Channels
- Independently Configurable as "or" or "and"
Asynchronous Propagation
- Halt operation quickly (current estimate is 1.3usec worst case round trip)
- 3/4 of this delay is time-of-flight.
Sticky bit operation
- Allows global barriers with a single channel.
User Space Accessible
- System selectable
It is partitioned along same boundaries as Tree, and Torus
- Each user partition contains it's own set of barrier/ interrupt signals

System Architecture - Computer Systems Architecture - Lecture Slides, Slides of Computer Architecture and Organization

Related documents

Partial preview of the text

Download System Architecture - Computer Systems Architecture - Lecture Slides and more Slides Computer Architecture and Organization in PDF only on Docsity!

The IBM Blue Gene/L System

Architecture

What is Blue Gene/L?

Main Design Principles (cont’d)

Blue Gene/L Architecture

Node Architecture

Link ASIC

The FP2 core (cont’d)

Memory System

Torus Network (cont’d)

Other Networks

Fast Barrier/Interrupt Network