Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Cell Programming Tutorial - Lecture Notes | CISC 879, Study notes of Computer Science

University of Delaware (UD)Computer Science

Material Type: Notes; Class: ADVANCED PARALLEL PROGRAMMING; Subject: Computer/Information Sciences; University: University of Delaware; Term: Spring 2008;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-alz 🇺🇸

10 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

!

CISC879: Software Support for Multicore Architectures

Spring 2008

Lecture 9: March 11

Lecturer: John Cavazos

Scribe: Brice Dobry

Cell Programming Tutorial

Outline:

I. Cell Basics

II. Programming Models

III. Programming Details

IV. Example Code - see slides for example code

I. Cell Basics

•Heterogeneous architecture (9 cores)

‣1 PPE - General purpose processor

✴In the picture, you can see that the PPE and takes up a large

amount of space on the die

✴Basically just a slightly modified PowerPC

✴Good for control-plane or “branchy” code

‣8 SPEs - SIMD processors

✴Good for computational code with few branches

✴On the PS3, one is disabled for yield reasons, and the other is

used by the game OS

✴We noted that it is strange that the game OS can run well on an

SPE since you would think that OS code would be very “bran-

chy”

•Program Structure

‣PPE code

✴Regular linux process (main thread)

✴Can spawn SPE threads

‣SPE code

✴Can be embedded in the PPE code

✴Also can be standalone “SPUlet”

•SPE details

‣All instructions are SIMD

9-1!Lecture 9: March 11

Discover Study notes of Computer Science University of Delaware (UD)

Partial preview of the text

Download Cell Programming Tutorial - Lecture Notes | CISC 879 and more Study notes Computer Science in PDF only on Docsity!

CISC879: Software Support for Multicore Architectures Spring 2008

Lecture 9: March 11

Lecturer: John Cavazos Scribe: Brice Dobry Cell Programming Tutorial Outline: I. Cell Basics II. Programming Models III. Programming Details IV. Example Code - see slides for example code I. Cell Basics

- Heterogeneous architecture (9 cores) ‣ 1 PPE - General purpose processor ✴ (^) In the picture, you can see that the PPE and takes up a large amount of space on the die ✴ (^) Basically just a slightly modified PowerPC ✴ (^) Good for control-plane or “branchy” code ‣ 8 SPEs - SIMD processors ✴ (^) Good for computational code with few branches ✴ (^) On the PS3, one is disabled for yield reasons, and the other is used by the game OS ✴ (^) We noted that it is strange that the game OS can run well on an SPE since you would think that OS code would be very “bran- chy” - Program Structure ‣ PPE code ✴ (^) Regular linux process (main thread) ✴ (^) Can spawn SPE threads ‣ SPE code ✴ (^) Can be embedded in the PPE code ✴ (^) Also can be standalone “SPUlet” - SPE details ‣ All instructions are SIMD

‣ 128 128-bit registers for each SPE ‣ 256 KB Local Store ✴ (^) Basically a software managed cache ✴ (^) Accessed with load/store instructions (16 bytes at a time) ✴ (^) Contains data and instructions ✴ (^) Allows “overlaying” functions so that if there is not enough room to fit two mutually exclusinve functions, they can occupy the same space in the LS and be brought in from main memory when needed ‣ DMA transfers are used to move data/instructions to/from main memory ✴ (^) High bandwidth - 128 bytes per cycle ✴ (^) Need to verify whether or not DMA transfers can be used to go directly to/from one LS to another threads LS ‣ Non-deterministic features of standard general purpose proces- sors are removed ✴ (^) Out of order execution ✴ (^) HW managed cache ✴ (^) HW branch prediction ✴ (^) Allows for smaller logic size of the SPEs ‣ Register Layout ‣ It is inefficient to operate on less than 16-byte “units”, smaller “units” should be packed and operated on with one instruction

‣ Since only 256 KB must fit all instructions and data, a streaming model is often used: II. Programming Models

- Choosing which model to use depends on how the data/computation can be partitioned ‣ Program structure ‣ Data structures used - Also need to consider how to DMA in and out efficiently - Possible models ‣ Data parallel

✴ (^) A large array of data is fed through the SPEs whic do the same calculation on each data segment ‣ Task parallel ✴ (^) Pipeline style where each SPE does a computation and then passes its output to the next SPE ✴ (^) This model seems to be too inefficient in practice and can usu- ally be morphed into one of the other models ‣ Job queue

✴ 32-bit messages

✴ 2 for sending, 1 for receiving

Signals
mfc_put(lsaddr,ea,size,tag,tid,rid)
Copy memory from my LS to the main memory
lsaddr is the address in my local store, ea is the address in

main memory and size is the size

tag is used for calls to determine when the transfer has com-

pleted

mfc_get(lsaddr,ea,size,tag,tid,rid)
- Copy memory from the main memory to my LS
- lsaddr is the address in my local store, ea is the address in

main memory and size is the size

tag is used for calls to determine when the transfer has com-

pleted

Double-buffering can be used to help hide the DMA latency
- While doing operation n , put the results of operation n-1 to

Cell Programming Tutorial - Lecture Notes | CISC 879, Study notes of Computer Science

Related documents

Partial preview of the text

Download Cell Programming Tutorial - Lecture Notes | CISC 879 and more Study notes Computer Science in PDF only on Docsity!

Lecture 9: March 11

✴ 32-bit messages

✴ 2 for sending, 1 for receiving

main memory and size is the size

pleted

main memory and size is the size

pleted

main memory and get the input for operation n+