Introduction to Cell Processor, Lecture Slides - Assembly Programming, Slides of Assembly Language Programming

Introduction to Cell Processor Cell Basic Design Concept Cell Hardware Overview Cell Processor Cell Processor Components Cell Performance Characteristics Cell Application Affiinity Cell

Typology: Slides

2010/2011

Uploaded on 10/11/2011

lovefool
lovefool 🇬🇧

4.5

(21)

292 documents

1 / 74

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dr. Michael Perrone, IBM. 1 6.189 IAP 2007 MIT
6.189 IAP 2007
Lecture 2
Introduction to the Cell Processor
Michael Perrone ([email protected])
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a

Partial preview of the text

Download Introduction to Cell Processor, Lecture Slides - Assembly Programming and more Slides Assembly Language Programming in PDF only on Docsity!

Dr. Michael Perrone, IBM.

6.189 IAP 2007

Lecture 2

Introduction to the Cell Processor

Michael Perrone ([email protected])

Dr. Michael Perrone, IBM.

Class Agenda

Motivation for multicore chip design

Cell basic design concept

Cell hardware overview „

Cell highlights

„

Cell processor

„

Cell processor components

Cell performance characteristics

Cell application affinity

Cell software overview „

Cell software environment

„

Development tools

„

Cell system simulator

„

Optimized libraries

Cell software development considerations

Cell blade

6.189 IAP 2007 MIT

Dr. Michael Perrone, IBM.

Technology Scaling – We’ve hit the wall

1988

1992

1996

2000

2004

2008

2012

0.80.60.4 0.

8 6 4 2 1

20 10

Conventional Bulk CMOSSOI (silicon-on-insulator)High mobilityDouble-Gate

Relative Device Performance

Year

?

6.189 IAP 2007 MIT

Dr. Michael Perrone, IBM.

Power Density – The fundamental problem

1

10

100

1000

1.

μ

1

μ

0.

μ

0.

μ

0.

μ

0.

μ

0.

μ

0.

μ

0.

μ

0.

μ

i

i

Pentium

®

Pentium Pro

®

Pentium II

®

Pentium III

®

W/cm

Hot Plate

Nuclear Reactor

Source: Fred Pollack, Intel. New Microprocessor Challengesin the Coming Generations of CMOS Technologies, Micro

Dr. Michael Perrone, IBM.

Steam Iron

5W/cm

Has This Ever Happened Before?

6.189 IAP 2007 MIT

Dr. Michael Perrone, IBM.

Steam Iron

5W/cm

?

Has This Ever Happened Before?

opportunity

Dr. Michael Perrone, IBM.

Cell

Systems and Technology Group

Dr. Michael Perrone, IBM.

Cell History

IBM, SCEI/Sony, Toshiba Alliance formed in 2000

Design Center opened in March 2001 „

Based in Austin, Texas

Single Cell BE operational Spring 2004

2-way SMP operational Summer 2004

February 7, 2005: First technical disclosures

October 6, 2005: Mercury Announces Cell Blade

November 9, 2005: Open Source SDK & Simulator Published

November 14, 2005: Mercury Announces Turismo Cell Offering

February 8, 2006 IBM Announced Cell Blade

Systems and Technology Group

Dr. Michael Perrone, IBM.

Cell Basic Concept

Compatibility with 64b Power Architecture™ „

Builds on and leverages IBM investment and community

Increased efficiency and performance „

Attacks on the “Power Wall”–

Non Homogenous Coherent Multiprocessor

High design frequency @ a low operating voltage with advanced power management

Attacks on the “Memory Wall”–

Streaming DMA architecture

3-level Memory Model: Main Storage, Local Storage, Register Files

Attacks on the “Frequency Wall”–

Highly optimized implementation

Large shared register files and software controlled branching to allow deeper pipelines

Interface between user and networked world „

Image rich information, virtual reality

Flexibility and security

Multi-OS support, including RTOS / non-RTOS „

Combine real-time and non-real time worlds

Dr. Michael Perrone, IBM.

Cell Design Goals

Cell is an accelerator extension to Power „

Built on a Power ecosystem

Used best know system practices for processor design

Sets a new performance standard „

Exploits parallelism while achieving high frequency

Supercomputer attributes with extreme floating point capabilities

Sustains high memory bandwidth with smart DMA controllers

Designed for natural human interaction „

Photo-realistic effects

Predictable real-time response

Virtualized resources for concurrent activities

Designed for flexibility „

Wide variety of application domains

Highly abstracted to highly exploitable programming models

Reconfigurable I/O interfaces

Virtual trusted computing environment for security

Dr. Michael Perrone, IBM.

6.189 IAP 2007

Lecture 2

Cell Hardware Components

Dr. Michael Perrone, IBM.

Cell Chip

Dr. Michael Perrone, IBM.

Cell Processor Components (1)

Power Processor Element (PPE): „

General purpose, 64-bit RISCprocessor (PowerPC AS 2.0.2)

„

2-Way hardware multithreaded

„

L1 : 32KB I ; 32KB D

„

L2 : 512KB

„

Coherent load / store

„

VMX-

„

Realtime Controls–

Locking L2 Cache & TLB

Software / hardware managed TLB

Bandwidth / Resource Reservation

Mediated Interrupts

Element Interconnect Bus (EIB): „

Four 16 byte data rings supporting multiplesimultaneous transfers per ring

„

96Bytes/cycle peak bandwidth

„

Over 100 outstanding requests

In the Beginning

- the solitary Power Processor

Custom Designed

- for high frequency, space,

and power efficiency

96 Byte/Cycle

Element Interconnect Bus

Power Core

(PPE)

L2 Cache

NCU

6.189 IAP 2007 MIT

Dr. Michael Perrone, IBM.

Cell Processor Components (2)

Synergistic Processor Element (SPE): „

Provides the computational performance

„

Simple RISC User Mode Architecture–

Dual issue VMX-like

Graphics SP-Float

IEEE DP-Float

„

Dedicated resources: unified 128x128-bitRF, 256KB Local Store

„

Dedicated DMA engine: Up to 16outstanding requests

Memory Management & Mapping „

SPE Local Store aliased into PPE systemmemory

„

MFC/MMU controls / protects SPE DMAaccesses–

Compatible with PowerPC VirtualMemory Architecture

SW controllable using PPE MMIO

„

DMA 1,2,4,8,16,128 -> 16Kbyte transfersfor I/O access

„

Two queues for DMA commands: Proxy &SPU

Local Store

MFC SPU

N AUC

Local Store

MFC SPU

N AUC

Local Store

SPU MFC

N

AUC

Local Store

SPU MFC

N

AUC

Local Store

SPU

MFC

N

AUC

Local Store

SPU

MFC

N

AUC

Local Store

SPU

MFC

N

AUC

Local Store

SPU

MFC

N

AUC

96 Byte/Cycle

Element Interconnect Bus

Power Core

(PPE)

L2 Cache

NCU