Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Control Hazard Solutions - High Performance Computing - Lecture Slides, Slides of Computer Science

Biju Patnaik University of Technology Computer Science

Some concept of High Performance Computing are Addressing Modes, Program Execution, Basic Computer Organization, Control Hazard Solutions, Least Recently Used, Memory Hierarchy Progression. Main points of this lecture are: Control Hazard Solutions, Static Branch Prediction, Delayed Branching, Design Hardware, Control Transfer, Delay Slots, Instructions, Branch, Useful Instructions, Instructions

Typology: Slides

2012/2013

Uploaded on 04/28/2013

dewaan 🇮🇳

3.8

(4)

43 documents

1 / 16

This page cannot be seen from the preview

Don't miss anything!

High Performance Computing

Lecture 25

Docsity.com

Discover Slides of Computer Science Biju Patnaik University of Technology

Partial preview of the text

Download Control Hazard Solutions - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

High Performance Computing

Lecture 25

2 Control Hazard Solutions

1. Static Branch Prediction

2. Delayed Branching

 Design hardware so that control transfer

takes place after a few of the following

instructions

BEQ R1, R2, target

ADD R3, R2, R

 Delay slots: following instructions that are

executed whether or not the branch is taken

 Stall cycles are avoided if the delay slots

are filled with useful instructions

Delayed Branching…Compiler’s Role

 When filled from branch target or fall-through, patch-up code may be needed BEQZ R1, target target: ADDI R7, R7, 1 / Branch delay slot LW R8, - 8(R29)

fall through:

Delayed Branching…Compiler’s Role

 When filled from branch target or fall-through, patch-up code may be needed BEQZ R1, target ADDI R7, R7, 1 / Branch delay slot target: LW R8, - 8(R29)

SUBI R7, R7, 1

fall through:

7 If no instruction can be found…

 The compiler must insert an instruction that

does nothing

 other than occupying the delay slot, being fetched

and decoded

 Example: ADD R0, R0, R

 If an instruction that does nothing was included in

the instruction set, it would be called a No-

Operation instruction, or NOP for short

 NOP might be included in the assembly language

 It has practically the same effect as a STALL

cycle

8 Pipeline and Programming

 Consider a simple pipeline with the following

warnings in the ISA manual

1. One load delay slot

2. One branch delay slot

3. 2 instructions after FP arithmetic operation can’t

use the value computed by that instruction

 We will think about a specific program, say

vector addition

double A[1024], B[1024];

for (i=0; i<1024; i++) A[i] = A[i] + B[i];

10 Vector Addition Loop Loop: FLOAD F0, 0(R1) FLOAD F2, 0(R2) FADD F4, F0, F FSTORE 0(R1), F ADDI R1, R1, 8 ADDI R2, R2, 8 BLE R1, R3, Loop Loop: FLOAD F0, 0(R1) FLOAD F2, 0(R2) ADDI R1, R1, 8 FADD F4, F0, F ADDI R2, R2, 8 BLE R1, R3, Loop FSTORE - 8(R1), F 11 cycles per iteration 7 cycles per iteration

11 An even faster loop? Loop Unrolling

 Idea: Each time through the loop, do the work

of more than one iteration

 More instructions to use in reordering

 Less instructions executed for loop control

 … but program increases in size

13 Agenda

Program execution: Compilation, Object files, Function call and return, Address space, Data & its representation (4)
Computer organization: Memory, Registers, Instruction set architecture, Instruction processing (6)
Virtual memory: Address translation, Paging (4)
Operating system: Processes, System calls, Process management (6)
Pipelined processors: Structural, data and control hazards, impact on programming (4)
Cache memory: Organization, impact on programming (5)
Program profiling (2)
File systems: Disk management, Name management, Protection (4)
Parallel programming: Inter-process communication, Synchronization, Mutual exclusion, Parallel architecture, Programming with message passing using MPI (5)

14 Cache Memory; Memory Hierarchy

 Recall: In discussing pipeline, we assumed

that memory latency will be hidden so that it

appears to operate at processor speed

 Cache Memory: HW that makes this happen

 Design principle: Locality of Reference

 Temporal locality: least recently used objects are

least likely to be referenced in the near future

 Spatial locality: neighbours of recently referenced

locations are likely to be referenced in the near

future

16 Cache Design Cache address **A Fast Memory Do I Have It’? Logic Lookup Logic** Table ofAddresses I Have’ Cache Directory Cache RAM Typical size: 32KB i.e., 1000s of instructions/data items can be stored

Control Hazard Solutions - High Performance Computing - Lecture Slides, Slides of Computer Science

Related documents

Partial preview of the text

Download Control Hazard Solutions - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

High Performance Computing

Lecture 25

1. Static Branch Prediction

2. Delayed Branching

 Design hardware so that control transfer

takes place after a few of the following

instructions

BEQ R1, R2, target

ADD R3, R2, R

 Delay slots: following instructions that are

executed whether or not the branch is taken

 Stall cycles are avoided if the delay slots

are filled with useful instructions

Delayed Branching…Compiler’s Role

Delayed Branching…Compiler’s Role

SUBI R7, R7, 1

 The compiler must insert an instruction that

does nothing

 other than occupying the delay slot, being fetched

and decoded

 Example: ADD R0, R0, R

 If an instruction that does nothing was included in

the instruction set, it would be called a No-

Operation instruction, or NOP for short

 NOP might be included in the assembly language

 It has practically the same effect as a STALL

cycle

 Consider a simple pipeline with the following

warnings in the ISA manual

1. One load delay slot

2. One branch delay slot

3. 2 instructions after FP arithmetic operation can’t

use the value computed by that instruction

 We will think about a specific program, say

vector addition

double A[1024], B[1024];

for (i=0; i<1024; i++) A[i] = A[i] + B[i];

 Idea: Each time through the loop, do the work

of more than one iteration

 More instructions to use in reordering

 Less instructions executed for loop control

 … but program increases in size

 Recall: In discussing pipeline, we assumed

that memory latency will be hidden so that it

appears to operate at processor speed

 Cache Memory: HW that makes this happen

 Design principle: Locality of Reference

 Temporal locality: least recently used objects are

least likely to be referenced in the near future

 Spatial locality: neighbours of recently referenced

locations are likely to be referenced in the near

future